Adaptive Reinforcement Learning Planning: Harnessing Large Language Models for Complex Information Extraction Paper • 2406.11455 • Published Jun 17, 2024 • 1
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles Paper • 2505.19914 • Published May 26 • 43
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models Paper • 2505.02735 • Published May 5 • 34
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs Paper • 2504.15415 • Published Apr 21 • 22
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published Apr 7 • 44
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published Mar 11 • 71
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models Paper • 2502.16614 • Published Feb 23 • 27
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation Paper • 2404.12753 • Published Apr 19, 2024 • 43
UI Agent Collection a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics • 437 items • Updated 1 day ago • 65
PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment Paper • 2410.13785 • Published Oct 17, 2024 • 19
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation Paper • 2404.12753 • Published Apr 19, 2024 • 43