ä¼åæ°æ®ä¼ è¾åé¢å¤ç工使µ | AIçæåç¿»è¯
以䏿¯å¯¹è¿ä¸ªåºåä¸åçäºæ çåè§£ï¼
1. é®é¢
æ¨æä¸ä¸ª VPS åä¸ä¸ª GPU dropletï¼å¯è½æ¯é å¤ H200 GPU çäºæå¡å¨ï¼ã æåï¼éè¿ external (public) network å¨å®ä»¬ä¹é´ä¼ è¾ 61 GB æ°æ® å°è±è´¹ ~5 å°æ¶ï¼å¯è½æ¯ç±äºå¸¦å®½æéæå»¶è¿é«ï¼ã
2. 设置å é¨ç½ç»éä¿¡
两个æå¡å¨é½å¨åä¸ä¸ª internal networkï¼ç§æ IPï¼
- VPS å¨
10.100.0.2 - GPU droplet å¨
10.100.3
æ¨éªè¯äºå®ä»¬å¯ä»¥éè¿å é¨ç½ç»éä¿¡ï¼å»¶è¿é常ä½ï¼1.56 msï¼ï¼æ¯éè¿äºèç½å¿«å¾å¤ã
SSH Key Setup
æ¨ä» VPS â GPU droplet è®¾ç½®äº SSH key authenticationï¼è¿æ ·ä¼ è¾å°±ä¸éè¦å¯ç ã ä¸ä¸ªå°é®é¢ï¼å¯é¥è¿æ¥æ¶æ²¡ææ¢è¡ï¼å¯¼è´è®¤è¯å¤±è´¥ï¼æ¨ä¿®å¤äºå®ã
3. å é¨ç½ç»ä¼ è¾é度
䏿¦å é¨ç½ç»è·¯å¾åå¤å¥½ï¼æ¨ç´æ¥ä¼ è¾äº 61 GB æ°æ®éï¼
- éåº¦å³°å¼ ~450 MB/s
- æ»ä¼ è¾æ¶é´ï¼~5 åé ï¼å¯¹æ¯å¤é¨ç½ç»ç 5 å°æ¶ï¼
è¿ç§å·¨å¤§çå 鿥èªäºï¼
- Private network â æ äºèç½è·¯ç±ç¶é¢
- é«å é¨å¸¦å®½ï¼å¯è½ 10 Gbps ææ´é«ï¼
- èç¹ä¹é´ ä½å»¶è¿
4. GPU Droplet ä¸çæ°æ®åå¤
ä¼ è¾åï¼æ¨å¨ GPU dropletï¼H200 æºå¨ï¼ä¸å¼å§ data preprocessingã
æ¨è¿è¡äº 12 ä¸ªå¹¶è¡ workersï¼æ¯ä¸ªä½¿ç¨ ~93% CPU â æå³çå¨é¢å¤çæé´ä»»å¡æ¯ CPU ç»å®çï¼è¿æªå° GPU ç»å®ã
å¤çé度
- ~40 million tokens æ¯ç§
- æ¯ä¸ª shardï¼å¯è½æ¯æ°æ®éçä¸ä¸ªåï¼å¤çæ¶é´ ~2 ç§
- æ» shardsï¼~610
- ~1 åéåï¼42 个 shards 宿ï¼7.9 GB è¾åºï¼ â ~7% 宿
- ETAï¼~14â20 åé 宿é¢å¤ç
5. 为ä»ä¹éè¦
è¿ä¸ªå·¥ä½æµç¨å±ç¤ºäº 常è§ç ML data pipeline ä¼åï¼
- å é¨ä¼ è¾æ°æ® â ä¸å¨å ¬å ±äºèç½ä¸ â 以é¿å æ ¢éã
- 使ç¨é«æ§è½åå¨å计ç®ï¼H200 + å¤ä¸ª CPU æ ¸å¿ï¼å¿«éé¢å¤çã
- å¹¶è¡å tokenization/shuffling/writing 以æå¤§åååéã
ç»æï¼ ä» VPS ä¸çåå§æ°æ® â å¨ GPU æå¡å¨ä¸åå¤å¥½çé¢å¤çæ°æ®éï¼æ»å ± ä¸å° 30 åéï¼èæ´ç´ æ¹æ³ä» ä¼ è¾å°±éè¦å å°æ¶ã
å¦ææ¨æ³äºè§£å é¨ç½ç»è®¾ç½®ãå¹¶è¡é¢å¤çé»è¾æå¦ä½çæ§æ¤ç±»ç®¡éçæ´æ·±å ¥è§£éï¼è¯·åè¯æã
