Datadog Labs/dd-llmo-experiment-analyzer — Agent Skills | officialskills.sh
Back to skills

dd-llmo-experiment-analyzer

communitydata

Analyzes LLM experiment results from Datadog, supporting single or comparative experiments in exploratory or Q&A modes.

Setup & Installation

npx skills add https://github.com/datadog-labs/agent-skills --skill dd-llmo-experiment-analyzer
or paste the link and ask your coding assistant to install it
https://github.com/datadog-labs/agent-skills/tree/main/dd-llmo/experiment-analyzer
View on GitHub

What This Skill Does

Analyzes LLM experiment results from Datadog, supporting single or comparative experiments in exploratory or Q&A modes. Given one or two experiment IDs, it pulls metrics, segments failures, samples representative events, and produces a structured report with root-cause hypotheses and actionable recommendations.

Instead of manually querying experiment summaries, cross-referencing metrics by segment, and sampling failure events one by one, this skill runs the full analysis pipeline automatically and delivers a report with specific numbers and linked examples.

When to use it

  • Comparing two LLM experiment runs to find where the candidate regressed
  • Drilling into the worst-performing segments of a single evaluation run
  • Answering a specific question about metric differences between a baseline and candidate
  • Exporting an experiment analysis report to a Datadog notebook for team review
  • Spotting error clusters and failure themes across experiment events