dd-llmo-experiment-analyzer
Analyzes LLM experiment results from Datadog, supporting single or comparative experiments in exploratory or Q&A modes.
Setup & Installation
What This Skill Does
Analyzes LLM experiment results from Datadog, supporting single or comparative experiments in exploratory or Q&A modes. Given one or two experiment IDs, it pulls metrics, segments failures, samples representative events, and produces a structured report with root-cause hypotheses and actionable recommendations.
Instead of manually querying experiment summaries, cross-referencing metrics by segment, and sampling failure events one by one, this skill runs the full analysis pipeline automatically and delivers a report with specific numbers and linked examples.
When to use it
- Comparing two LLM experiment runs to find where the candidate regressed
- Drilling into the worst-performing segments of a single evaluation run
- Answering a specific question about metric differences between a baseline and candidate
- Exporting an experiment analysis report to a Datadog notebook for team review
- Spotting error clusters and failure themes across experiment events
Similar Skills
minimax-xlsx
Handles Excel and CSV/TSV files through direct XML manipulation rather than library round-trips.
xlsx
Reads, writes, and edits Excel and tabular files (.
meme-rush
Meme Rush tracks meme token launches across Pump.
query-token-info
Queries token data from Binance Web3 APIs across BSC, Base, and Solana.
