Search papers, labs, and topics across Lattice.
The paper introduces PANGAEA-GPT, a hierarchical multi-agent system for autonomous data discovery and analysis within Earth science data archives. This system addresses the underutilization of data in repositories like PANGAEA by implementing a Supervisor-Worker topology with data-type-aware routing, sandboxed code execution, and self-correction mechanisms. Experiments in oceanography and ecology demonstrate the system's ability to execute complex workflows with minimal human intervention, facilitating querying and analysis of heterogeneous data.
A hierarchical multi-agent system can autonomously navigate and analyze vast, underutilized Earth science datasets, outperforming simple LLM wrappers with its robust error handling and coordinated workflows.
The rapid accumulation of Earth science data has created a significant scalability challenge; while repositories like PANGAEA host vast collections of datasets, citation metrics indicate that a substantial portion remains underutilized, limiting data reusability. Here we present PANGAEA-GPT, a hierarchical multi-agent framework designed for autonomous data discovery and analysis. Unlike standard Large Language Model (LLM) wrappers, our architecture implements a centralized Supervisor-Worker topology with strict data-type-aware routing, sandboxed deterministic code execution, and self-correction via execution feedback, enabling agents to diagnose and resolve runtime errors. Through use-case scenarios spanning physical oceanography and ecology, we demonstrate the system's capacity to execute complex, multi-step workflows with minimal human intervention. This framework provides a methodology for querying and analyzing heterogeneous repository data through coordinated agent workflows.