RAE Collection Collection for Diffusion Transformers with Representation Autoencoders β’ 7 items β’ Updated 11 days ago β’ 11
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 Dec 18, 2025 β’ 121
Pre-training Dataset Samples Collection A collection of pre-training datasets samples of sizes 10M, 100M and 1B tokens. Ideal for use in quick experimentation and ablations. β’ 18 items β’ Updated 4 days ago β’ 18
view article Article A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons Feb 4, 2025 β’ 30
AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions Paper β’ 2509.13523 β’ Published Sep 16, 2025 β’ 7
AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions Paper β’ 2509.13523 β’ Published Sep 16, 2025 β’ 7 β’ 2
AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions Paper β’ 2509.13523 β’ Published Sep 16, 2025 β’ 7
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. β’ 11 items β’ Updated 3 days ago β’ 102