SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper ⢠2602.13515 ⢠Published 11 days ago ⢠42 ⢠5
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper ⢠2602.13515 ⢠Published 11 days ago ⢠42 ⢠5