Abstract
ResearchMath-14k dataset and ResearchMath-Reasoning trajectories are introduced to advance research-level mathematical reasoning in language models, demonstrating that filtered open-problem attempts provide useful supervision for model improvement.
The frontier of mathematics is defined by problems whose solutions are not yet known, yet it remains unclear whether language models can meaningfully engage with such problems without human intervention. A major obstacle is the lack of large-scale research-level math datasets. To this end, we introduce ResearchMath-14k, a set of 14{,}056 problems curated from academic sources via a multi-agent pipeline, making it the largest collection of research-level mathematical problems to date. We further generate ResearchMath-Reasoning, 220K teacher trajectories from two open models, where we observe recurring avoidance behaviors such as non-attempts and fabricated references. Interestingly, across eight open-weight models, newer generations produce 5.6times more references and 5.0times more fake references per trace. After agentic filtering of ResearchMath-Reasoning, fine-tuning Qwen3 models from 4B to 30B parameters improves over base models by 9.2 points on average. This shows that filtered open-problem attempts can provide useful supervision even without fully correct reasoning traces. We make ResearchMath-14k publicly available for future works on research-level mathematical reasoning.
Community
We release a collection of 14k research-level (mostly open) math .
Link to Dataset: https://huggingface.co/datasets/amphora/ResearchMath-14k
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Re$^2$Math: Benchmarking Theorem Retrieval in Research-Level Mathematics (2026)
- MathDuels: Evaluating LLMs as Problem Posers and Solvers (2026)
- Do We Need Frontier Models to Verify Mathematical Proofs? (2026)
- Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs (2026)
- QED: An Open-Source Multi-Agent System for Generating Mathematical Proofs on Open Problems (2026)
- RMA: an Agentic System for Research-Level Mathematical Problems (2026)
- Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.28003 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
