Search papers, labs, and topics across Lattice.
The paper introduces Agentic DAG-Orchestrated Transformer (A.DOT) Planner, a framework for answering complex, multi-hop questions over hybrid data lakes by compiling NL queries into DAG execution plans that span structured and unstructured data sources. A.DOT decomposes queries into parallelizable sub-queries, incorporates schema-aware reasoning, and uses structural and semantic validation before execution, coordinating retrieval and merging results according to the DAG plan. Experiments on a benchmark dataset demonstrate that A.DOT achieves a 14.8% improvement in correctness and a 10.7% improvement in completeness compared to baseline methods.
Stop brute-forcing question answering over hybrid data lakes: A.DOT Planner compiles NL queries into DAGs for efficient, multi-hop reasoning across structured and unstructured data, boosting correctness by 14.8%.
Enterprises increasingly need natural language (NL) question answering over hybrid data lakes that combine structured tables and unstructured documents. Current deployed solutions, including RAG-based systems, typically rely on brute-force retrieval from each store and post-hoc merging. Such approaches are inefficient and leaky, and more critically, they lack explicit support for multi-hop reasoning, where a query is decomposed into successive steps (hops) that may traverse back and forth between structured and unstructured sources. We present Agentic DAG-Orchestrated Transformer (A.DOT) Planner, a framework for multi-modal, multi-hop question answering, that compiles user NL queries into directed acyclic graph (DAG) execution plans spanning both structured and unstructured stores. The system decomposes queries into parallelizable sub-queries, incorporates schema-aware reasoning, and applies both structural and semantic validation before execution. The execution engine adheres to the generated DAG plan to coordinate concurrent retrieval across heterogeneous sources, route intermediate outputs to dependent sub-queries, and merge final results in strict accordance with the plan's logical dependencies. Advanced caching mechanisms, incorporating paraphrase-aware template matching, enable the system to detect equivalent queries and reuse prior DAG execution plans for rapid re-execution, while the DataOps System addresses validation feedback or execution errors. The proposed framework not only improves accuracy and latency, but also produces explicit evidence trails, enabling verification of retrieved content, tracing of data lineage, and fostering user trust in the system's outputs. On benchmark dataset, A.DOT achieves 14.8% absolute gain in correctness and 10.7% in completeness over baselines.