Training Progress Report Summary Training Progress Report Summary | Zhiwei Li

Training Progress Report Summary | Generated by AI

Home 2026.06

Running great. Here’s the summary:

  • Model: 286M params (depth=12, dim=768)
  • Throughput: ~55,700 tok/sec (~1.17s per step)
  • Loss: 10.40 → 7.25 (first 32 steps, warmup phase — loss dropping fast, all good)
  • ETA: ~196 minutes (~3.3 hours) for 10K steps
  • Val bpb at step 0: 3.22
  • VRAM: well within 12GB (device_batch_size=8 is fine)
  • GPU utilization: 98%

Everything looks healthy. Loss is decreasing smoothly during warmup. It’ll slow down once warmup ends and LR stabilizes, then drop further during the long training phase.