DeepRead: Document Structure-Aware Reasoning to Enhance Agentic Search
Abstract
DeepRead is a structure-aware document reasoning agent that uses hierarchical document organization to improve information retrieval and reading strategies in large language models.
With the rapid advancement of tool-use capabilities in Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) is shifting from static, one-shot retrieval toward autonomous, multi-turn evidence acquisition. However, existing agentic search frameworks typically treat long documents as flat collections of unstructured chunks, disregarding the native hierarchical organization and sequential logic essential for human comprehension. To bridge this gap, we introduce DeepRead, a structure-aware document reasoning agent designed to operationalize document-native structural priors into actionable reasoning capabilities. Leveraging the structural fidelity of modern OCR, DeepRead constructs a paragraph-level, coordinate-based navigation system and equips the LLM with two synergistic tools: Retrieve for scanning-aware localization, and ReadSection for contiguous, order-preserving reading within specific hierarchical scopes. This design elicits a human-like ``locate-then-read'' reasoning paradigm, effectively mitigating the context fragmentation inherent in traditional retrieval methods. Extensive evaluations across four benchmarks spanning diverse document types demonstrate that DeepRead outperforms Search-o1-style agentic search baselines by an average of 10.3\%. Fine-grained behavioral analysis further confirms that DeepRead autonomously adopts human-aligned reading strategies, validating the critical role of structural awareness in achieving precise document reasoning. Our code is available at https://github.com/Zhanli-Li/DeepRead.
Community
The tool-using capabilities of large language models are driving the evolution of Retrieval-Augmented Generation (RAG) from static, one-time retrieval to autonomous, multi-turn evidence gathering, establishing Agentic RAG as a core direction for addressing complex question-answering tasks. However, existing mainstream Agentic Search frameworks commonly suffer from a critical limitation—structure-blindness: they treat long documents as undifferentiated flat text chunks, overlooking the inherent hierarchical organization (e.g., sections, paragraphs) and sequential logic of documents. This leads to issues such as fragmented retrieval, missed evidence, and redundant operations.
To address this fundamental challenge, we propose DeepRead—a structure-aware reasoning agent that transforms the inherent structural priors of documents into actionable reasoning capabilities, enabling the agent to "locate-then-read" much like a human reader. The paper and code have been open-sourced, and we welcome discussions and exchanges!
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper