File size: 3,615 Bytes
03df2d8
3935961
03df2d8
 
 
 
 
b1d5beb
 
03df2d8
 
3935961
 
 
41f79d4
3935961
 
 
 
 
 
 
 
 
2209621
3935961
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
title: Proactive Interactive Reasoning (PIR)
emoji: πŸŒ–
colorFrom: blue
colorTo: indigo
sdk: gradio
pinned: false
license: apache-2.0
short_description: Enables reasoning-LLM to ask clarification questions
---

# Reasoning While Asking: Transforming Reasoning LLMs into Proactive Inquirers (PIR)

[![arXiv](https://img.shields.io/badge/arXiv-2601.22139-b31b1b.svg)](https://arxiv.org/abs/2601.22139)
[![GitHub](https://img.shields.io/badge/GitHub-Proactive--Interactive--R1-black?logo=github)](https://github.com/SUAT-AIRI/Proactive-Interactive-R1)

This organization hosts the official models and datasets for the paper **"Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers"**.

## πŸ’‘ Motivation

Current reasoning LLMs (e.g., GPT-o1, DeepSeek-R1) suffer from **blind self-thinking**: they perform extensive internal reasoning even when critical information is missing or user intent is ambiguous. This leads to overthinking, hallucinations, and misaligned conclusions.

**PIR (Proactive Interactive Reasoning)** is a new paradigm that transforms reasoning LLMs from passive solvers into **proactive inquirers**. Instead of guessing, PIR-enabled models detect uncertainty during reasoning and actively ask users for clarification before proceeding.

<img src="https://raw.githubusercontent.com/SUAT-AIRI/Proactive-Interactive-R1/refs/heads/main/image/paradigm.png" width="1000" alt="PIR Framework Overview">

### Key Features

- **User-Intent Alignment**: Optimizes interaction through US-GRPO with composite rewards balancing accuracy, efficiency, and helpfulness.
- **Significant Improvements**: Up to **32.70% higher accuracy**, **22.90% higher pass rate**, and **41.36 BLEU improvement** over baselines.
- **Reduced Computation**: Nearly halves unnecessary reasoning tokens and interaction turns.

## πŸ“¦ Models

We provide the following models trained with the PIR paradigm:

| Model Name | Description | Link |
| :--- | :--- | :--- |
| **Proactive-Interactive-R1-Math-7B** | The core model optimized for mathematical reasoning with clarification capabilities. | [View Model](https://huggingface.co/Proactive-Interactive-R1/Proactive-Interactive-R1-Math-7B) |
| **Proactive-Interactive-R1-Math-7B-Pro** | An enhanced version of the Math-7B model. | [View Model](https://huggingface.co/Proactive-Interactive-R1/Proactive-Interactive-R1-Math-7B-Pro) |
| **Proactive-Interactive-R1-SFT-7B** | The base SFT model before Reinforcement Learning alignment. | [View Model](https://huggingface.co/Proactive-Interactive-R1/Proactive-Interactive-R1-SFT-7B) |

## πŸ“š Datasets

The datasets used to train and evaluate PIR are available here:

- **[Reasoning-While-Asking-SFT-Dataset](https://huggingface.co/datasets/Proactive-Interactive-R1/Reasoning-While-Asking-SFT-Dataset)**: The dataset used for the initial Supervised Fine-Tuning (SFT) phase.
- **[DeepSeek-R1-Distill-Data-5k](https://huggingface.co/datasets/Proactive-Interactive-R1/DeepSeek-R1-Distill-Data-5k)**: Distilled data used for training.

## πŸ“œ Citation

If you find this work useful, please cite our paper:

```bibtex
@misc{chen2026reasoningaskingtransformingreasoning,
      title={Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers}, 
      author={Xin Chen and Feng Jiang and Yiqian Zhang and Hardy Chen and Shuo Yan and Wenya Xie and Min Yang and Shujian Huang},
      year={2026},
      eprint={2601.22139},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.22139},
}