Running 1 Transformer Training Visualized π Visualize GPT training with weights, gradients, and attention