Papers
arxiv:2602.05014

DeepRead: Document Structure-Aware Reasoning to Enhance Agentic Search

Published on Feb 4
Authors:
,
,
,
,

Abstract

DeepRead is a structure-aware document reasoning agent that uses hierarchical document organization to improve information retrieval and reading strategies in large language models.

AI-generated summary

With the rapid advancement of tool-use capabilities in Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) is shifting from static, one-shot retrieval toward autonomous, multi-turn evidence acquisition. However, existing agentic search frameworks typically treat long documents as flat collections of unstructured chunks, disregarding the native hierarchical organization and sequential logic essential for human comprehension. To bridge this gap, we introduce DeepRead, a structure-aware document reasoning agent designed to operationalize document-native structural priors into actionable reasoning capabilities. Leveraging the structural fidelity of modern OCR, DeepRead constructs a paragraph-level, coordinate-based navigation system and equips the LLM with two synergistic tools: Retrieve for scanning-aware localization, and ReadSection for contiguous, order-preserving reading within specific hierarchical scopes. This design elicits a human-like ``locate-then-read'' reasoning paradigm, effectively mitigating the context fragmentation inherent in traditional retrieval methods. Extensive evaluations across four benchmarks spanning diverse document types demonstrate that DeepRead outperforms Search-o1-style agentic search baselines by an average of 10.3\%. Fine-grained behavioral analysis further confirms that DeepRead autonomously adopts human-aligned reading strategies, validating the critical role of structural awareness in achieving precise document reasoning. Our code is available at https://github.com/Zhanli-Li/DeepRead.

Community

The tool-using capabilities of large language models are driving the evolution of Retrieval-Augmented Generation (RAG) from static, one-time retrieval to autonomous, multi-turn evidence gathering, establishing Agentic RAG as a core direction for addressing complex question-answering tasks. However, existing mainstream Agentic Search frameworks commonly suffer from a critical limitation—structure-blindness: they treat long documents as undifferentiated flat text chunks, overlooking the inherent hierarchical organization (e.g., sections, paragraphs) and sequential logic of documents. This leads to issues such as fragmented retrieval, missed evidence, and redundant operations.

To address this fundamental challenge, we propose DeepRead—a structure-aware reasoning agent that transforms the inherent structural priors of documents into actionable reasoning capabilities, enabling the agent to "locate-then-read" much like a human reader. The paper and code have been open-sourced, and we welcome discussions and exchanges!

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.05014 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.05014 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.05014 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.