CUDA ä»£çæ´å¥½å°ä¼å GPU å æ ¸ | AIçæåç¿»è¯
é®é¢ï¼ æ ¹æ® ByteDance ç论æï¼CUDA Agent åä»ä¹ï¼
çæ¡ï¼
CUDA Agent â æ¦è¿°
CUDA Agent æ¯ç± ByteDance Seedï¼ä¸æ¸ å大å¦åä½ï¼å¼åçå¤§è§æ¨¡ agentic reinforcement learning ç³»ç»ï¼éè¿å¯æ©å±çæ°æ®åæãæè½å¢å¼ºçæ§è¡ç¯å¢ä»¥å稳å®çé¿æ¶ç¨ RL è®ç»ï¼åå±åºé²æ£ç CUDA kernel ä¼åè½åã
å®è§£å³çæ ¸å¿é®é¢
GPU kernel ä¼åæ¯ç°ä»£æ·±åº¦å¦ä¹ çåºç¡ï¼ä½ä»æ¯ä¸é¡¹é«åº¦ä¸ä¸åçä»»å¡ï¼éè¦æ·±åç硬件ä¸ä¸ç¥è¯ã尽管å¨éç¨ç¼ç¨æ¹é¢è¡¨ç°åºè²ï¼å¤§åè¯è¨æ¨¡å (LLMs) å¨ CUDA kernel çæä¸ä»æ æ³ä¸åºäºç¼è¯å¨çç³»ç»ï¼å¦ torch.compileï¼ç«äºãç°æç CUDA 代ç çææ¹æ³è¦ä¹ä¾èµæ è®ç»çç²¾ç¼ï¼è¦ä¹å¨åºå®çå¤è½®æ§è¡åé¦å¾ªç¯ä¸å¾®è°æ¨¡åï¼ä½è¿äºèå¼é½æ æ³ä»æ ¹æ¬ä¸æå模åçå
å¨ CUDA ä¼åè½åã
ä¸ä¸ªæ ¸å¿ç»ä»¶
1. 坿©å±çæ°æ®åæ
è®ç»ä»»å¡éè¿ä¸é¶æ®µç®¡éæå»ºï¼ç§åé®é¢ç¬åãåºäº LLM çç»ååæï¼ä»¥åæ§è¡é©±å¨çè¿æ»¤ãä» torch å transformers 䏿æç§åç®åï¼æ¯ä¸ªç®å表示为带æåå§åå forward æ¹æ³ç Python ç±»ãç»ååæéæ ·æå¤ 5 个 torch ç®åï¼å¹¶å°å®ä»¬é¡ºåºç»åæèåä»»å¡ãæç»ç²¾éæ°æ®éå
å« 6,000 个è®ç»æ ·æ¬ï¼CUDA-Agent-Ops-6Kï¼ï¼ä¸ä¸ºå¯æ©å±ç RL è®ç»è®¾è®¡ï¼å
·æå¹¿æ³çä»»å¡å¤æ ·æ§åéä½ç污æé£é©ã
2. æè½å¢å¼ºç代çç¯å¢
代ç循ç¯éµå¾ª ReAct 飿 ¼ç工使µç¨ï¼é
å¤ç¼ç å·¥å
·å CUDA æè½è§èï¼SKILL.mdï¼ï¼æ¯æè¿ä»£ç¼ç ãç¼è¯è°è¯å¨æä»¥å profiler å¼å¯¼çä¼åãæ å工使µç¨æ¯ï¼profile åç PyTorchï¼å®ç° CUDA kernels/bindingsï¼å¨ GPU sandbox ä¸ç¼è¯ï¼ç¶åè¿ä»£ãç®æ è¦æ±æ¯ï¼éè¿æ£ç¡®æ§æ£æ¥ï¼å¹¶è¶
è¿ torch.compile ç 5% å éã
代çé
å¤äº BashToolãGlobToolãMultiEditTool å TodoWriteTool çå·¥å
·ï¼å¹¶å¨åé¶æ®µå¾ªç¯ä¸è¿è¡ï¼åæåç PyTorch å®ç°çæ§è½ï¼éè¿é忍¡åå®ç°èªå®ä¹ CUDA ç®åï¼å¨ GPU sandbox ç¯å¢ä¸ç¼è¯åè¯ä¼°ï¼ç¶åéå¤ç´å°å®ç°è¶
è¿ torch.compile åºçº¿ç 5% å éã
3. 稳å®çé¿æ¶ç¨ RL è®ç»
è®ç»åé¶æ®µè¿è¡ï¼ä»¥ç¨³å® CUDA ç¼ç çé¿æ¶ç¨ RLãé¦å è¿è¡åè½® PPO é¢çï¼ç¶åå¨å®æ´å¤è½® agentic RL ä¹ååå§å actor å criticãActor åå§å使ç¨å¨éæ ·è½¨è¿¹ä¸ç Rejection Fine-Tuning (RFT)ï¼è¿äºè½¨è¿¹å ·æç§¯æç»æãRFT è¿æ»¤æä½æå¾ªç¯åæ æçå·¥å ·è°ç¨æ¨¡å¼ï¼ä»¥éä½çç¥å´©æºé£é©ãéè¿è¿ç§å¤é¶æ®µè®¾è®¡ï¼è®ç»å¨é¿ä¸ä¸æè®¾ç½®ä¸ä¿æç¨³å®ï¼æå¤ 128k ä¸ä¸æã150 个è®ç»è½®æ¬¡ï¼ä»¥åè¯ä¼°æé´æå¤ 200 个轮次ï¼ï¼ä»èå®ç°æç»çå¥å±å¢é¿ã
åºç¡æ¨¡å
CUDA Agent æ¯ ByteDance ç Seed 1.6 LLM çå¾®è°çæ¬ï¼è¿æ¯ä¸ä¸ª Mixture-of-Experts (MoE) 模åï¼æ¿æ´»åæ° 23Bï¼æ»åæ° 230Bãå¾®è°å¨ 128 å¼ NVIDIA H20 GPU çé群ä¸è¿è¡ã
å ³é®ç»æ
CUDA Agent å¨ KernelBench ä¸å®ç°äºæå
è¿çç»æï¼å¨ Level-1ãLevel-2 å Level-3 åå²ä¸å嫿¯ torch.compile å¿« 100%ã100% å 92%ï¼æ´ä½éè¿ç为 98.8%ï¼æ´ä½å é 2.11x vs. torch.compileã
ä¸ Claude Opus 4.5 å Gemini 3 Pro å¨å¤æ kernel ä¸ç 40 åå·®è·è¡¨æï¼éç¨ç¼ç è½åæ¯å¿ è¦çä½ä¸è¶³ä»¥å®ç° GPU ä¼åââä½ éè¦é对硬件åºç¡å¥å±çé¢åç¹å® RLã
为ä»ä¹éè¦
å
³é®ä¸¾æªäºåçæ¥æ¾èæè§ï¼å¦æä½ å¸ææ¨¡åçæå¿«é代ç ï¼å°±å¥å±å®çæå¿«é代ç ââ䏿¯æ£ç¡®ä»£ç ï¼ä¸æ¯çèµ·æ¥å好代ç ç代ç ï¼èæ¯ profiler 说快éç代ç ãè¶
è¿ torch.compile ç 2.11x å éå¾éè¦ï¼å 为 torch.compile æ¯å¤§å¤æ° PyTorch ç¨æ·çé»è®¤ä¼åè·¯å¾ã妿 RL è®ç»ç代çè½å¨ç¸åç¡¬ä»¶ä¸æç»å»è´¥ç¼è¯å¨ï¼å®å°±ä¸ºç产ç¯å¢ä¸ AI çæç kernel å代æå¨è°ä¼çç®ååºå¼è¾äºéè·¯ã
åèæç®ï¼
- CUDA Agent 宿¹é¡¹ç®é¡µé¢
- HuggingFace 论æé¡µé¢
- Awesome Agents â ByteDance CUDA Agent è§£æ
- Import AI Newsletter æ¥é
