éè¿å®è¯æµè¯ä¼åæç¤º | AIçæåç¿»è¯
é®é¢ï¼Claude Code å¢éå¦ä½å¼ååå®åå ¶ system prompts å skillsââä»ä»¬æ¯å¦ä½¿ç¨æ·»å /ç§»é¤ prompt é¨åå¹¶è¿è¡ interference tests æ¥æ¯è¾ prompt æ¯å¦æå¯è¡¡éææçæ¹æ³ï¼
åçï¼
æ¯çï¼ä½ çç´è§åºæ¬æ£ç¡®ãClaude Code å¢é使ç¨ä¸¥è°¨çãè¿ä»£çãå®è¯æ¹æ³æ¥å¼ååå®å system prompts å skillsââè¿ä¸ªè¿ç¨è¿è¶ ç®åççæµãä¸é¢æ¯å ¶å·¥ä½åççå ¨é¢åè§£ï¼
1. æ©æé¶æ®µï¼å¿«éåé¦è¿ä»£
Claude Code ä»åºäº Anthropic åå·¥åå¤é¨ç¨æ·çåé¦è¿è¡å¿«éè¿ä»£å¼å§ãå¨è¿ä¸ªæ©æé¶æ®µï¼å¢é伿·»å ææ´æ¹ prompt å 容ï¼å¨å é¨é¨ç½²ï¼ç§°ä¸º âdogfoodingâï¼ï¼å¹¶è§å¯è¡ä¸ºä¸ç宿§å·®å¼ââæ£æ¯ä½ æè¿°ç鿣å¼âæ·»å /ç§»é¤ä¸ä¸ª prompt å¹¶æ¥çæ¯å¦æå½±åâçæ¹æ³ã
2. æ·»å æ£å¼ Evalsï¼è¯ä¼°ï¼
忥ï¼ä»ä»¬æ·»å äº evalsââé¦å é对ççªé¢åå¦ concision å file editsï¼ç¶åé对æ´å¤æçå¦ over-engineeringãè¿äº evals æå©äºè¯å«é®é¢ãæå¯¼æ¹è¿ï¼å¹¶èç¦ç ç©¶-产ååä½ã
Evals æ¬è´¨ä¸æ¯èªå¨åæµè¯å¥ä»¶ï¼ç¨äºæµéæ¨¡åæ¯å¦æé¢æè¡ä¸ºãå¯ä»¥å¨ä¸é¨ç½²å°ç产ç¯å¢æå½±åçå®ç¨æ·çæ åµä¸ï¼å¯¹ä»£çè¿è¡æ°å个任å¡è¿è¡èªå¨åè¯ä¼°ã
3. A/B æµè¯åçäº§çæ§
ç»åçäº§çæ§ãA/B æµè¯ãç¨æ·ç ç©¶çï¼evals æä¾äºä¿¡å·ï¼ä»¥ç»§ç»æ¹è¿ Claude Code å¹¶å®ç°è§æ¨¡åã
è¿å°±æ¯ä½ æå°çâinterference testâæ¦å¿µï¼ä¸¤ä¸ªçæ¬ç prompt å¹¶è¡è¿è¡ï¼å¢éæ¯è¾ç»æï¼ä»¥ç¡®å®ç¹å® prompt å奿¯å¦æçå®çå¯è¡¡éææï¼è¿æ¯åªæ¯åªå£°ã
4. 模åå System Prompt æ¶æ
Claude Code ç system prompts 䏿¯ä¸ä¸ªå·¨å¤§çåä½ promptï¼èæ¯é«åº¦æ¨¡ååãå®ä»¬å æ¬ç¬ç«ç prompt é¨åï¼å¦âDoing tasks (avoid over-engineering)âãâDoing tasks (no premature abstractions)âãâDoing tasks (no compatibility hacks)âåâDoing tasks (no time estimates)âââæ¯ä¸ªé¨åç¬ç«èå´å¹¶è®¡ç® token æ°ã
è¿ç§æ¨¡åå设计使å¢éè½å¤é离å个 prompt é¨åï¼å¹¶æµè¯ç§»é¤ãæ·»å ææ¹åå®ä»¬æ¯å¦å½±å模åè¡ä¸ºââè¿æ¬è´¨ä¸æ¯åæ§ç ablation testingã
5. Skillsï¼âSkill Creatorâå Evals 管é
å¯¹äº skillsï¼æ©å± Claude Code è½åçæ¨¡åå SKILL.md æä»¶ï¼ï¼Anthropic éè¿ Claude Code Skills 2.0ï¼2026 å¹´ 3 æ 3 æ¥æ´æ°ï¼è¿ä¸æ¥æ£å¼åäºå¼åæ¹æ³ã
æ´æ°çæ¡æ¶å æ¬ï¼å¼åæµè¯ç¨ä¾å benchmarks æ¥æµé skill 坹任塿§è½çå½±åï¼è¿ä»£å®å skill æè¿°ä»¥æé«è§¦ååç¡®æ§åå¯é æ§ï¼å¹¶ä½¿ç¨è®ç»åæµè¯æ°æ®éè¿è¡ç²¾ç¡®è°æ´ã
æ´æ°ç skill-creator ç°å¨ç±å个并è¡å·¥ä½çå¯ç»åå代çæä½ï¼æ§è¡ skill 对 eval prompts ç executorï¼è¯ä¼°è¾åºæ¯å¦ç¬¦åå®ä¹é¢æç graderï¼å¯¹ skill çæ¬è¿è¡ç² A/B æ¯è¾ç comparatorï¼ä»¥åæç¤ºèåç»è®¡å¯è½éèæ¨¡å¼ç analyzerã
6. Benchmark 模å¼ï¼âè¿ä¸ª Skill/Prompt ççæå¸®å©åï¼â
Benchmark 模å¼å¨æ´ä¸ª eval éä¸è¿è¡æ ååè¯ä¼°å¹¶è®°å½ææ ãBenchmark æ¯è¾ skill æ¿æ´»æ¶çæ§è½ä¸æ skillï¼baselineï¼æ¶çæ§è½ï¼å¹¶å¹¶ææ¾ç¤ºï¼ä»èæä¾å®¢è§æ°æ®æ¥åçæ ¹æ¬é®é¢ï¼âè¿ä¸ª skill ççæ¹åäºäºæ åï¼â
è¿æä¸ä¸ªç§°ä¸º outgrowth detection çæ¦å¿µï¼å¦æåºç¡æ¨¡åå³ä½¿ä¸å è½½ skill ä¹è½éè¿ evalsï¼ç³»ç»ä¼åè¯ä½ â丢å¼è¿ä¸ª skillï¼æ¨¡åå·²ç»è¶³å¤å¥½äºâãè¿é²æ¢äºæ» prompt æééæ¶é´ç§¯ç´¯ã
7. éè¿è¿ä»£ç®æ³ç Prompt ä¼å
Prompt ä¼åå³ä½¿å¯¹é¡¶çº§ç¼ç 代çä¹è½æ¾èæ¹è¿ââä» ä¼å Claude Code ç system prompt å°±å¨éç¨ç¼ç æ§è½ä¸å¸¦æ¥äº 5%+ çæåï¼å¨ä¸ç¨äºå个 repository æ¶æåæ´å¤§ã该è¿ç¨ä½¿ç¨ benchmark ä»»å¡ï¼å¦ SWE-Benchï¼ç train/test splits æ¥éªè¯ prompt æ´æ¹æ¯å¦æ³åï¼èéä» overfitã
æ»ç»
æä»¥æ¯çââä½ çæè¿°æ¯åç¡®çãå¢éç¡®å®ä¼æ·»å åç§»é¤ prompt åå¥ï¼å¹¶è¿è¡âinterferenceâ飿 ¼çæ¯è¾ï¼ä½æ¯å¨è§æ¨¡ååºç¡ä¸ï¼å¹¶ä½¿ç¨æ£å¼çè¯ä¼°åºç¡è®¾æ½ï¼ä½¿ç»æå ·æç»è®¡æä¹èéè½¶äºæ§ã
åèæç®ï¼
- Anthropic: Demystifying Evals for AI Agents
- Piebald-AI: Claude Code System Prompts (GitHub)
- Claude Code Skills 2.0: Evals, Benchmarks and A/B Testing
- Tessl: Anthropic Brings Evals to Skill-Creator
- Arize: CLAUDE.md Best Practices from Prompt Learning
- Geeky Gadgets: Claude Code Skills 2.0
