SERE: Similarity-based Expert Re-routing for Efficient Batch Decoding in MoE Models Paper • 2602.07616 • Published 4 days ago
EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models Paper • 2412.07210 • Published Dec 10, 2024