ZClip: Adaptive Spike Mitigation for LLM Pre-Training Paper • 2504.02507 • Published Apr 3, 2025 • 88
ZClip: Adaptive Spike Mitigation for LLM Pre-Training Paper • 2504.02507 • Published Apr 3, 2025 • 88
Variance Control via Weight Rescaling in LLM Pre-training Paper • 2503.17500 • Published Mar 21, 2025 • 5
Variance Control via Weight Rescaling in LLM Pre-training Paper • 2503.17500 • Published Mar 21, 2025 • 5
Running 3.76k The Ultra-Scale Playbook 🌌 3.76k The ultimate guide to training LLM on large GPU Clusters
view article Article Falcon 2: An 11B parameter pretrained language model and VLM, trained on over 5000B tokens and 11 languages +7 May 24, 2024 • 28