Search papers, labs, and topics across Lattice.
This paper introduces VeriGraph, a neuro-symbolic reasoning framework designed to enhance the verifiability of outputs from LLM-based agents in data-analytic tasks. By constructing a directed acyclic graph (DAG) that explicitly connects raw data, computed results, and natural-language claims, VeriGraph allows for improved traceability and semantic support of conclusions. Experimental results demonstrate that VeriGraph-8B outperforms existing models, achieving an 87.61% Grounding Rate, indicating a significant advancement in the auditability of AI-generated analytical outputs.
Explicit evidence graphs in VeriGraph enable LLMs to achieve 87.61% claim grounding, transforming how we verify AI-generated conclusions.
LLM-based agents have demonstrated strong capabilities in data-intensive analytical tasks, yet their outputs are rarely verifiable: a reliance on linear text trajectories makes their reasoning difficult to audit. In particular, deterministic computations over raw data and semantic deductions over natural-language claims are often entangled in an unstructured stream, leaving numerical conclusions hard to reproduce and qualitative judgments hard to inspect. To address this, we propose VeriGraph, a traceable neuro-symbolic reasoning framework that enables agents to construct an explicit heterogeneous evidence directed acyclic graph (DAG) during execution. VeriGraph introduces three evidence-expansion primitives, namely computational, grounding, and derivational expansion, to connect raw data, interpreter variables, computed results, and natural-language claims in a unified graph. Under this formulation, structural traceability is reduced to graph reachability from raw data sources to terminal claims, while semantic support is measured by claim-level evidence evaluation. To improve graph construction, we further design a graph-based policy optimization strategy with a composite reward that jointly supervises answer correctness, computational integrity, and derivational coherence. Experiments on four benchmarks show that VeriGraph-8B achieves the highest overall score among all baselines. More importantly, VeriGraph produces auditable evidence graphs with substantially stronger claim grounding, achieving a 87.61\% Grounding Rate under our claim-level evidence support evaluation. These results suggest that explicit evidence-graph construction is a promising path toward verifiable data-analytic agents. Our code is available at https://github.com/ignorejjj/VeriGraph.