Hugging Face/hugging-face-vision-trainer — Agent Skills | officialskills.sh
Back to skills

hugging-face-vision-trainer

officialai-tools

Trains and fine-tunes vision models on Hugging Face Jobs cloud GPUs, covering object detection (D-FINE, RT-DETR v2, DETR, YOLOS), image classification (timm models including MobileNetV3, ResNet, ViT), and SAM/SAM2 segmentation.

Setup & Installation

npx skills add https://github.com/huggingface/skills --skill hugging-face-vision-trainer
or paste the link and ask your coding assistant to install it
https://github.com/huggingface/skills/tree/main/skills/hugging-face-vision-trainer
View on GitHub

What This Skill Does

Trains and fine-tunes vision models on Hugging Face Jobs cloud GPUs, covering object detection (D-FINE, RT-DETR v2, DETR, YOLOS), image classification (timm models including MobileNetV3, ResNet, ViT), and SAM/SAM2 segmentation. Handles COCO-format dataset prep, Albumentations augmentation, mAP/accuracy evaluation, and automatic model persistence to the Hugging Face Hub.

It handles bbox format detection, string category remapping, dataset validation, Trackio monitoring, and Hub persistence in one workflow, eliminating the manual infrastructure work that typically precedes each training run.

When to use it

  • Training a custom object detector on COCO-format data using D-FINE or RT-DETR v2 on cloud GPUs
  • Fine-tuning a MobileNetV3 or ViT classifier on a Hub image dataset without a local GPU
  • Fine-tuning SAM2 for image matting or segmentation using bounding box or point prompts
  • Running the dataset inspector to catch column format mismatches before submitting a GPU training job
  • Saving fine-tuned detection or classification checkpoints directly to the Hugging Face Hub from ephemeral cloud jobs