iframe-proxy | Sunbelt Computer Software

Setup & Installation

npx skills add https://github.com/datadog-labs/agent-skills --skill dd-llmo-eval-bootstrap

or paste the link and ask your coding assistant to install it

https://github.com/datadog-labs/agent-skills/tree/main/dd-llmo/eval-bootstrap

What This Skill Does

Analyzes production LLM traces from Datadog and generates ready-to-use evaluator code using the Datadog Evals SDK. It samples real traffic, identifies quality dimensions worth measuring, and outputs BaseEvaluator subclasses or LLMJudge instances you can plug into LLM Experiments.

Writing evaluators from scratch means guessing what quality dimensions matter, but this skill samples actual production traces and proposes grounded evals with evidence, so you start from real behavior instead of assumptions.

When to use it

Generating evaluators for a RAG app after noticing answer quality drift
Building a test suite from production traces of a customer support chatbot
Creating LLM judge prompts grounded in real failure patterns from an RCA report
Auditing an agent app for scope violations using trace-based safety evals
Bootstrapping format and correctness checks for a new ml_app with no existing coverage

Sunbelt Computer Software

PL/B Language Development and Support

dd-llmo-eval-bootstrap

Setup & Installation

What This Skill Does

When to use it

Similar Skills

wp-phpstan

web-quality-audit

core-web-vitals

webapp-testing