arxiv:2512.04324

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Published on Dec 3

· Submitted by

taesiri on Dec 5

#2 Paper of the day

ByteDance Seed

Upvote

126

Authors:

Jinxiang Meng ,

Jianwen Luo ,

Abstract

DAComp is a benchmark of 210 tasks that evaluates the capabilities of agents in real-world data engineering and data analysis workflows, revealing significant deficiencies in both areas.

AI-generated summary

Real-world enterprise data intelligence workflows encompass data engineering that turns raw sources into analytical-ready tables and data analysis that convert those tables into decision-oriented insights. We introduce DAComp, a benchmark of 210 tasks that mirrors these complex workflows. Data engineering (DE) tasks require repository-level engineering on industrial schemas, including designing and building multi-stage SQL pipelines from scratch and evolving existing systems under evolving requirements. Data analysis (DA) tasks pose open-ended business problems that demand strategic planning, exploratory analysis through iterative coding, interpretation of intermediate results, and the synthesis of actionable recommendations. Engineering tasks are scored through execution-based, multi-metric evaluation. Open-ended tasks are assessed by a reliable, experimentally validated LLM-judge, which is guided by hierarchical, meticulously crafted rubrics. Our experiments reveal that even state-of-the-art agents falter on DAComp. Performance on DE tasks is particularly low, with success rates under 20%, exposing a critical bottleneck in holistic pipeline orchestration, not merely code generation. Scores on DA tasks also average below 40%, highlighting profound deficiencies in open-ended reasoning and demonstrating that engineering and analysis are distinct capabilities. By clearly diagnosing these limitations, DAComp provides a rigorous and realistic testbed to drive the development of truly capable autonomous data agents for enterprise settings. Our data and code are available at https://da-comp.github.io

View arXiv page View PDF Project page Add to collection

Community

taesiri

Paper submitter 2 days ago

Mjx0221

Paper author 1 day ago

Today, we are thrilled to introduce DAComp—a comprehensive benchmark designed to evaluate the performance of LLM-based data agents across the entire enterprise data intelligence lifecycle.

Why do we need DAComp? Current benchmarks are often limited to isolated SQL generation or simple Q&A. However, in real-world enterprise environments, data work is a complex, closed loop. DAComp fills this gap by covering two core domains:

🔹 Data Engineering (DE): Going beyond single-point code generation to challenge repository-level engineering capabilities. Agents must handle architectural design, implementation, and system evolution in response to changing requirements.

🔹 Data Analysis (DA): We introduce open-ended analysis tasks, requiring agents to demonstrate deep reasoning, strategic planning, and comprehensive synthesis capabilities.

Watch our video for a closer look at these specific task categories!