RetMask Collection Trained checkpoints for the paper "From Interpretability to Performance: Optimizing Retrieval Heads for Long-Context Language Models" • 4 items • Updated 18 days ago
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5 Text Generation • 8B • Updated Jun 25, 2025 • 2.33k • • 19
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5 Text Generation • 8B • Updated Jun 25, 2025 • 2.33k • • 19
tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4 Text Generation • 71B • Updated Jul 1, 2025 • 224 • • 13
tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3 Text Generation • 71B • Updated Apr 2, 2025 • 378 • • 13
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 Text Generation • 8B • Updated Apr 2, 2025 • 2.98k • • 24
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.2 Text Generation • 8B • Updated Apr 2, 2025 • 126 • • 16