deepseek-ai/DeepSeek-R1-Distill-Qwen-32B Text Generation • 33B • Updated Feb 24, 2025 • 2.43M • • 1.49k
Focused Transformer: Contrastive Training for Context Scaling Paper • 2307.03170 • Published Jul 6, 2023 • 11