Embarrassingly Simple Self-Distillation Improves Code Generation Paper • 2604.01193 • Published 10 days ago • 34
Austin362667/Qwen3-0.6B-MLX-bf16-python-18k-alpaca Text Generation • 0.6B • Updated 27 days ago • 652
Austin362667/Qwen3-0.6B-MLX-bf16-python-5k-alpaca-resampled-Qwen-4B Text Generation • 0.6B • Updated 27 days ago • 656
Austin362667/Qwen3-0.6B-MLX-bf16-python-5k-alpaca-resampled-Qwen-4B Text Generation • 0.6B • Updated 27 days ago • 656
Austin362667/python_code_instructions_5k_alpaca_qwen3_4B_resampled Viewer • Updated 27 days ago • 5k • 13
Austin362667/python_code_instructions_5k_alpaca_qwen3_4B_resampled Viewer • Updated 27 days ago • 5k • 13
Austin362667/Qwen3-0.6B-MLX-bf16-python-18k-alpaca Text Generation • 0.6B • Updated 27 days ago • 652
Austin362667/python_code_instructions_5_alpaca_qwen3_4B_resampled Viewer • Updated 28 days ago • 5.01k • 16
Austin362667/python_code_instructions_5_alpaca_qwen3_4B_resampled Viewer • Updated 28 days ago • 5.01k • 16
view article Article Assisted Generation: a new direction toward low-latency text generation May 11, 2023 • 78
view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes Aug 17, 2022 • 129
view article Article OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve May 20, 2025 • 64
view article Article Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective Jan 27 • 70
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 292