Boundary-Guided Policy Optimization for Memory-Efficient RL of Diffusion Large Language Models
AI & ML interests
None defined yet.
Recent Activity
View all activity
Papers
DeepPrune: Parallel Scaling without Inter-trace Redundancy
SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression
Scaling Iterative Reinforcement Learning with Interleaved Compression
OpenSAE checkpoints for LLaMA 3.1 8B base model
EMNLP2024 Main Conference: 《Aligning Large Language Models on Information Extraction》
Parallel Scaling without Inter-trace Redundancy
RL trained models and datasets for instruction-following
《Constraint Back-translation Improves Complex Instruction Following of Large Language Models》
-
Constraint Back-translation Improves Complex Instruction Following of Large Language Models
Paper • 2410.24175 • Published • 18 -
THU-KEG/Mistral-Crab-SFT
Text Generation • 7B • Updated • 8 • 5 -
THU-KEG/Mistral-Crab-DPO
Text Generation • 7B • Updated • 9 • 4 -
THU-KEG/Llama3-Crab-SFT
Text Generation • Updated • 6
Boundary-Guided Policy Optimization for Memory-Efficient RL of Diffusion Large Language Models
Parallel Scaling without Inter-trace Redundancy
Scaling Iterative Reinforcement Learning with Interleaved Compression
RL trained models and datasets for instruction-following
OpenSAE checkpoints for LLaMA 3.1 8B base model
《Constraint Back-translation Improves Complex Instruction Following of Large Language Models》
-
Constraint Back-translation Improves Complex Instruction Following of Large Language Models
Paper • 2410.24175 • Published • 18 -
THU-KEG/Mistral-Crab-SFT
Text Generation • 7B • Updated • 8 • 5 -
THU-KEG/Mistral-Crab-DPO
Text Generation • 7B • Updated • 9 • 4 -
THU-KEG/Llama3-Crab-SFT
Text Generation • Updated • 6
EMNLP2024 Main Conference: 《Aligning Large Language Models on Information Extraction》