RelayGen: Intra-Generation Model Switching for Efficient Reasoning Paper • 2602.06454 • Published 5 days ago • 11
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection Paper • 2602.03216 • Published 8 days ago • 12
Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning Paper • 2505.13866 • Published May 20, 2025 • 17
FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration Paper • 2502.01068 • Published Feb 3, 2025 • 18