microsoft/bitnet-b1.58-2B-4T
Text Generation
•
0.8B
•
Updated
•
14.9k
•
1.3k
Explore the FineWeb dataset and its creation process
The ultimate guide to training LLM on large GPU Clusters
Estimate GPU memory usage for transformer training