æ©æ£ä¸è¿ä»£ä¼å | AIçæåç¿»è¯
é®é¢ï¼ âdiffusion in Transformersâ æ¯æçæä¸å¼ å¾åï¼ç¶ååå¤å°å ¶è¾å ¥åå»å次çæåï¼
åçï¼
ä¸å¤ªåç¡®ââä½ æä¸¤ä»¶ä¸åçäºæ æ··æ·äºãææ¥å¸®ä½ æ¢³çæ¸ æ¥ã
æ©æ£ï¼Diffusionï¼è¿ç¨
æ©æ£æ¨¡åçå·¥ä½åä¸ºä¸¤ä¸ªé¶æ®µï¼
ååè¿ç¨ï¼è®ç»æ¶ï¼ï¼ åä¸å¼ çå®å¾åï¼å¨ T æ¥å éæ¥æ·»å 髿¯åªå£°ï¼ç´å°åæçº¯åªå£°ãè¿ä¸ªè¿ç¨æ¯åºå®ç/è§£æçââä¸éè¦ç¥ç»ç½ç»ã
ååè¿ç¨ï¼æ¨çæ¶ï¼ï¼ ä»çº¯åªå£°å¼å§ï¼éæ¥å»åªââæ¯ä¸æ¥é¢æµå¹¶å»é¤ä¸å°é¨ååªå£°ã
x_Tï¼çº¯åªå£°ï¼â x_{T-1} â x_{T-2} â ... â x_0ï¼å¹²åå¾åï¼
没éï¼ä½ éè¦è¿è¡æ¨¡å大约 50â1000 次ï¼ä½æ¯ä¸æ¥é½æ¯å»åªï¼è䏿¯ä»å¤´åçãæ¨¡åæ¯ä¸æ¥é¢æµçæ¯åªå£°åé εï¼èéç´æ¥é¢æµæç»å¾åã
æ ¸å¿å ¬å¼ï¼DDPMï¼ï¼
x_{t-1} = 1/âα_t * (x_t - (1-α_t)/â(1-á¾±_t) * ε_θ(x_t, t)) + Ï_t * z
å
¶ä¸ ε_θ æ¯ç¥ç»ç½ç»ï¼U-Net æ Transformerï¼ã
Transformer å¦ä½èå ¥å ¶ä¸
DiTï¼Diffusion Transformer, Peebles & Xie 2022ï¼ ç¨ Transformer åä»£äº U-Net 骨干ç½ç»ãå ¶æ¶æï¼
- å°å¸¦åªæ½å¨ç¹å¾
x_tååï¼patchifyï¼ä¸º tokens - éè¿ adaLNï¼èªéåºå±å½ä¸åï¼ä»¥æ¶é´æ¥
tåç±»å«æ ç¾ä¸ºæ¡ä»¶ - è¿è¡æ å Transformer 模åï¼èªæ³¨æå + FFNï¼
- å° tokens éæ°æ¼åï¼unpatchifyï¼æ½å¨å½¢ç¶
# DiT ååä¼ æç¤ºæ
def forward(self, x, t, y):
x = self.patchify(x) # [B, N, D]
x = x + self.pos_embed # ä½ç½®ç¼ç
c = self.adaln_embed(t, y) # æ¶é´æ¥ + ç±»å«æ¡ä»¶
for block in self.blocks:
x = block(x, c) # transformer + adaLN
x = self.unpatchify(x) # 颿µåªå£° ε
return x
è¿ç§æ¹æ³è¢«ç¨å¨ Stable Diffusion 3ãFLUXãSora ä¸ââå®ä»¬é½ä½¿ç¨ DiT åä»£äº U-Netã
æ¨çæ¶çå®é 循ç¯
x = torch.randn(latent_shape) # ä»åªå£°å¼å§
for t in reversed(range(T)): # ä¾å¦ä½¿ç¨ DDIM æ¶ T=50
eps = model(x, t, conditioning) # transformer 颿µåªå£°
x = denoise_step(x, eps, t) # scheduler å»é¤è¯¥åªå£°
image = vae.decode(x) # å°æ½å¨ç¹å¾è§£ç 为åç´ ç©ºé´å¾å
æ¯æ¬¡è¿ä»£ï¼åä¸ä¸ªæ¨¡åï¼ä¸åç tï¼éæ¥åå¾å¹²åç xãè¿ä¸æ¯âçæå¾å â ç¨å¾åä½ä¸ºæç¤ºâââèæ¯âå»åªæ½å¨ç¹å¾ â 忬¡å»åªâã
ä½ å¯è½æ³å°çï¼è¿ä»£ç»åï¼
ç¡®å®å卿´æ¥è¿ä½ æè¿°ç模åï¼
- ä¸è´æ§æ¨¡åï¼Consistency Modelsï¼ââå°æ©æ£è¸é¦ä¸ºæ´å°çæ¥éª¤ï¼å¯ä»¥ä¸æ¥çæ
- æµå¹é ï¼Flow Matchingï¼ï¼ç¨äº SD3ãFLUXï¼ââ使ç¨ç´çº¿è½¨è¿¹èéæ©æ£
- èªåå½å¾å模åï¼å¦ LlamaGenãVARï¼ââ顺åºé¢æµå¾å tokenï¼æ´åè¯è¨å»ºæ¨¡
VARï¼Visual AutoRegressiveï¼å°¤å ¶æè¶£ââå çæç²ç³å辨çï¼ç¶å卿´ç»å°ºåº¦ä¸ç»åãè¿æ´æ¥è¿âçæå¾å â ç¨å®å次çæâã
æ»ç»
DiT ä¸ç Transformer 卿¶æä¸å¹¶æ²¡æä¸ºæ©æ£åä»ä¹ç¹å«çäºæ ââå®åªæ¯æ¯ U-Net æ´å¥½çå»åªéª¨å¹²ç½ç»ï¼å ä¸ºèªæ³¨æåè½å¤æè·å·ç§¯ U-Net å¨å¤§è§æ¨¡ä¸é¾ä»¥å¤ççè¿è·ç¦»ç©ºé´ä¾èµã
åèæç®ï¼
- DiT 论æ ââ Scalable Diffusion Models with Transformers
- DDPM ââ Denoising Diffusion Probabilistic Models
- VAR ââ Visual AutoRegressive Modeling
