hugging-face-evaluation
Adds and manages evaluation results in Hugging Face model cards using the model-index metadata format.
Setup & Installation
What This Skill Does
Adds and manages evaluation results in Hugging Face model cards using the model-index metadata format. Supports extracting benchmark tables from README files, importing scores from the Artificial Analysis API, and running evaluations with vLLM or lighteval on local GPUs or HF Jobs infrastructure.
Instead of manually converting markdown tables to model-index YAML and resolving merge conflicts, this skill handles extraction, formatting, deduplication, and PR creation in a single CLI workflow.
When to use it
- Extracting benchmark scores from a model README and formatting them as model-index YAML
- Importing Artificial Analysis benchmark results directly into a model card
- Running MMLU or GSM8K evaluations on a HuggingFace model using a local GPU
- Submitting lighteval jobs to HF Jobs for models without a public API endpoint
- Opening a pull request with structured evaluation metadata on a model you don't own
Similar Skills
mcp-builder
A development guide for building MCP (Model Context Protocol) servers that connect LLMs to external APIs and services.
skill-creator
A skill for building, testing, and refining other skills.
template
A starter scaffold for building new agent skills.
answers
Provides AI-generated answers grounded in live web search results through Brave's OpenAI-compatible chat completions endpoint.
