GPU Profiling for LLM Fine-Tuning on DGX Spark: What the Traces Reveal
I fine-tuned a 1.5B parameter model on a DGX Spark and the training finished in 26 seconds. Good enough? I had no idea. The terminal showed 2.18 it/s and a loss curve that went down. But whether th...
