arxiv:2510.07231

EconCausal: A Context-Aware Causal Reasoning Benchmark for Large Language Models in Social Science

Published on Oct 8, 2025

Authors:

Abstract

Large language models demonstrate significant limitations in contextual causal reasoning, particularly when dealing with complex socio-economic environments where causal relationships vary across different institutional and market conditions.

AI-generated summary

Socio-economic causal effects depend heavily on their specific institutional and environmental context. A single intervention can produce opposite results depending on regulatory or market factors, contexts that are often complex and only partially observed. This poses a significant challenge for large language models (LLMs) in decision-support roles: can they distinguish structural causal mechanisms from surface-level correlations when the context changes? To address this, we introduce EconCausal, a large-scale benchmark comprising 10,490 context-annotated causal triplets extracted from 2,595 high-quality empirical studies published in top-tier economics and finance journals. Through a rigorous four-stage pipeline combining multi-run consensus, context refinement, and multi-critic filtering, we ensure each claim is grounded in peer-reviewed research with explicit identification strategies. Our evaluation reveals critical limitations in current LLMs' context-dependent reasoning. While top models achieve approximately 88 percent accuracy in fixed, explicit contexts, performance drops sharply under context shifts, with a 32.6 percentage point decline, and falls to 37 percent when misinformation is introduced. Furthermore, models exhibit severe over-commitment in ambiguous cases and struggle to recognize null effects, achieving only 9.5 percent accuracy, exposing a fundamental gap between pattern matching and genuine causal reasoning. These findings underscore substantial risks for high-stakes economic decision-making, where the cost of misinterpreting causality is high. The dataset and benchmark are publicly available at https://github.com/econaikaist/econcausal-benchmark.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.07231 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.07231 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.