Spaces:
No application file
No application file
File size: 3,615 Bytes
03df2d8 3935961 03df2d8 b1d5beb 03df2d8 3935961 41f79d4 3935961 2209621 3935961 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
---
title: Proactive Interactive Reasoning (PIR)
emoji: π
colorFrom: blue
colorTo: indigo
sdk: gradio
pinned: false
license: apache-2.0
short_description: Enables reasoning-LLM to ask clarification questions
---
# Reasoning While Asking: Transforming Reasoning LLMs into Proactive Inquirers (PIR)
[](https://arxiv.org/abs/2601.22139)
[](https://github.com/SUAT-AIRI/Proactive-Interactive-R1)
This organization hosts the official models and datasets for the paper **"Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers"**.
## π‘ Motivation
Current reasoning LLMs (e.g., GPT-o1, DeepSeek-R1) suffer from **blind self-thinking**: they perform extensive internal reasoning even when critical information is missing or user intent is ambiguous. This leads to overthinking, hallucinations, and misaligned conclusions.
**PIR (Proactive Interactive Reasoning)** is a new paradigm that transforms reasoning LLMs from passive solvers into **proactive inquirers**. Instead of guessing, PIR-enabled models detect uncertainty during reasoning and actively ask users for clarification before proceeding.
<img src="https://raw.githubusercontent.com/SUAT-AIRI/Proactive-Interactive-R1/refs/heads/main/image/paradigm.png" width="1000" alt="PIR Framework Overview">
### Key Features
- **User-Intent Alignment**: Optimizes interaction through US-GRPO with composite rewards balancing accuracy, efficiency, and helpfulness.
- **Significant Improvements**: Up to **32.70% higher accuracy**, **22.90% higher pass rate**, and **41.36 BLEU improvement** over baselines.
- **Reduced Computation**: Nearly halves unnecessary reasoning tokens and interaction turns.
## π¦ Models
We provide the following models trained with the PIR paradigm:
| Model Name | Description | Link |
| :--- | :--- | :--- |
| **Proactive-Interactive-R1-Math-7B** | The core model optimized for mathematical reasoning with clarification capabilities. | [View Model](https://huggingface.co/Proactive-Interactive-R1/Proactive-Interactive-R1-Math-7B) |
| **Proactive-Interactive-R1-Math-7B-Pro** | An enhanced version of the Math-7B model. | [View Model](https://huggingface.co/Proactive-Interactive-R1/Proactive-Interactive-R1-Math-7B-Pro) |
| **Proactive-Interactive-R1-SFT-7B** | The base SFT model before Reinforcement Learning alignment. | [View Model](https://huggingface.co/Proactive-Interactive-R1/Proactive-Interactive-R1-SFT-7B) |
## π Datasets
The datasets used to train and evaluate PIR are available here:
- **[Reasoning-While-Asking-SFT-Dataset](https://huggingface.co/datasets/Proactive-Interactive-R1/Reasoning-While-Asking-SFT-Dataset)**: The dataset used for the initial Supervised Fine-Tuning (SFT) phase.
- **[DeepSeek-R1-Distill-Data-5k](https://huggingface.co/datasets/Proactive-Interactive-R1/DeepSeek-R1-Distill-Data-5k)**: Distilled data used for training.
## π Citation
If you find this work useful, please cite our paper:
```bibtex
@misc{chen2026reasoningaskingtransformingreasoning,
title={Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers},
author={Xin Chen and Feng Jiang and Yiqian Zhang and Hardy Chen and Shuo Yan and Wenya Xie and Min Yang and Shujian Huang},
year={2026},
eprint={2601.22139},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2601.22139},
} |