General
updated
How to Synthesize Text Data without Model Collapse?
Paper
• 2412.14689
• Published
• 53
SepLLM: Accelerate Large Language Models by Compressing One Segment into
One Separator
Paper
• 2412.12094
• Published
• 11
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion
and Adversarial Training with Large Speech Language Models
Paper
• 2306.07691
• Published
• 13
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating
Inverse Short-Time Fourier Transform
Paper
• 2203.02395
• Published
• 1
Scaling Laws for Floating Point Quantization Training
Paper
• 2501.02423
• Published
• 26
Transformer^2: Self-adaptive LLMs
Paper
• 2501.06252
• Published
• 55
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
• 2501.08313
• Published
• 300
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
• 2501.05366
• Published
• 102
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Paper
• 2501.06282
• Published
• 53
An Empirical Study of Autoregressive Pre-training from Videos
Paper
• 2501.05453
• Published
• 41
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
• 2501.07301
• Published
• 100
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising
Steps
Paper
• 2501.09732
• Published
• 72
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
• 2501.09686
• Published
• 41
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video
Understanding
Paper
• 2501.13106
• Published
• 90
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
• 2501.17161
• Published
• 124
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute
in Linear Diffusion Transformer
Paper
• 2501.18427
• Published
• 24
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper
• 2502.01534
• Published
• 40
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper
• 2501.18492
• Published
• 88
Token Assorted: Mixing Latent and Text Tokens for Improved Language
Model Reasoning
Paper
• 2502.03275
• Published
• 18
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion
Transformer
Paper
• 2502.01105
• Published
• 21
Paper
• 2502.06049
• Published
• 31
Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
Paper
• 2502.07737
• Published
• 9
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on
a Single GPU
Paper
• 2502.08910
• Published
• 148
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
• 2502.05171
• Published
• 152
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance
Software Engineering?
Paper
• 2502.12115
• Published
• 46
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM
Multi-Agent Systems
Paper
• 2502.11098
• Published
• 13
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning
in Diffusion Models
Paper
• 2502.10458
• Published
• 38
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising
Trajectory Sharpening
Paper
• 2502.12146
• Published
• 16
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement
Learning
Paper
• 2502.14768
• Published
• 47
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open
Software Evolution
Paper
• 2502.18449
• Published
• 75
Slamming: Training a Speech Language Model on One GPU in a Day
Paper
• 2502.15814
• Published
• 69
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via
Reinforcement Learning and Reasoning
Paper
• 2503.07608
• Published
• 23
Personalize Anything for Free with Diffusion Transformer
Paper
• 2503.12590
• Published
• 44
Being-0: A Humanoid Robotic Agent with Vision-Language Models and
Modular Skills
Paper
• 2503.12533
• Published
• 68
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large
Reasoning Models with Iterative Retrieval Augmented Generation
Paper
• 2503.21729
• Published
• 29
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer
Paper
• 1910.10683
• Published
• 16
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation
Paper
• 2504.09454
• Published
• 11
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper
• 2504.11536
• Published
• 63
Efficient Generative Model Training via Embedded Representation Warmup
Paper
• 2504.10188
• Published
• 12
InstantCharacter: Personalize Any Characters with a Scalable Diffusion
Transformer Framework
Paper
• 2504.12395
• Published
• 16
Iterative Self-Training for Code Generation via Reinforced Re-Ranking
Paper
• 2504.09643
• Published
• 34
DMM: Building a Versatile Image Generation Model via Distillation-Based
Model Merging
Paper
• 2504.12364
• Published
• 22
Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through
the Lens of Internal Representations
Paper
• 2504.13816
• Published
• 18
TTRL: Test-Time Reinforcement Learning
Paper
• 2504.16084
• Published
• 120
Kuwain 1.5B: An Arabic SLM via Language Injection
Paper
• 2504.15120
• Published
• 121
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal
Large Language Models
Paper
• 2504.15279
• Published
• 78
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World
Model-based LLM Agents
Paper
• 2504.15785
• Published
• 22
Token-Shuffle: Towards High-Resolution Image Generation with
Autoregressive Models
Paper
• 2504.17789
• Published
• 23
Step1X-Edit: A Practical Framework for General Image Editing
Paper
• 2504.17761
• Published
• 92
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image
Generation
Paper
• 2504.17502
• Published
• 55
Paper2Code: Automating Code Generation from Scientific Papers in Machine
Learning
Paper
• 2504.17192
• Published
• 123
Breaking the Modality Barrier: Universal Embedding Learning with
Multimodal LLMs
Paper
• 2504.17432
• Published
• 40
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery
Simulation
Paper
• 2504.17207
• Published
• 30
Can Large Language Models Help Multimodal Language Analysis? MMLA: A
Comprehensive Benchmark
Paper
• 2504.16427
• Published
• 18
BitNet v2: Native 4-bit Activations with Hadamard Transformation for
1-bit LLMs
Paper
• 2504.18415
• Published
• 49
DeepCritic: Deliberate Critique with Large Language Models
Paper
• 2505.00662
• Published
• 54
PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with
Auto-Regressive Transformer
Paper
• 2505.04622
• Published
• 27
Unified Continuous Generative Models
Paper
• 2505.07447
• Published
• 42
Learning Dynamics in Continual Pre-Training for Large Language Models
Paper
• 2505.07796
• Published
• 19
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture,
Training and Dataset
Paper
• 2505.09568
• Published
• 99
Thinkless: LLM Learns When to Think
Paper
• 2505.13379
• Published
• 50