hugging-face-vision-trainer
Trains and fine-tunes vision models on Hugging Face Jobs cloud GPUs, covering object detection (D-FINE, RT-DETR v2, DETR, YOLOS), image classification (timm models including MobileNetV3, ResNet, ViT), and SAM/SAM2 segmentation.
Setup & Installation
What This Skill Does
Trains and fine-tunes vision models on Hugging Face Jobs cloud GPUs, covering object detection (D-FINE, RT-DETR v2, DETR, YOLOS), image classification (timm models including MobileNetV3, ResNet, ViT), and SAM/SAM2 segmentation. Handles COCO-format dataset prep, Albumentations augmentation, mAP/accuracy evaluation, and automatic model persistence to the Hugging Face Hub.
It handles bbox format detection, string category remapping, dataset validation, Trackio monitoring, and Hub persistence in one workflow, eliminating the manual infrastructure work that typically precedes each training run.
When to use it
- Training a custom object detector on COCO-format data using D-FINE or RT-DETR v2 on cloud GPUs
- Fine-tuning a MobileNetV3 or ViT classifier on a Hub image dataset without a local GPU
- Fine-tuning SAM2 for image matting or segmentation using bounding box or point prompts
- Running the dataset inspector to catch column format mismatches before submitting a GPU training job
- Saving fine-tuned detection or classification checkpoints directly to the Hugging Face Hub from ephemeral cloud jobs
Similar Skills
mcp-builder
A development guide for building MCP (Model Context Protocol) servers that connect LLMs to external APIs and services.
skill-creator
A skill for building, testing, and refining other skills.
template
A starter scaffold for building new agent skills.
answers
Provides AI-generated answers grounded in live web search results through Brave's OpenAI-compatible chat completions endpoint.
