Search papers, labs, and topics across Lattice.
This paper introduces HiRAS, a hierarchical multi-agent framework designed to enhance the automation of computational research by coordinating specialized agents for end-to-end experiment reproduction. The authors address the limitations of existing sequential agent pipelines by implementing supervisory manager agents that improve global coordination, resulting in over a 10% relative performance gain compared to previous state-of-the-art methods. Additionally, they propose the Paper2Code-Extra (P2C-Ex) evaluation protocol, which incorporates repository-level information to better align with reference-based metrics, significantly reducing hallucinations in the process.
Over 10% performance improvement in experiment reproduction reveals the power of hierarchical coordination in multi-agent systems for computational research.
Recent advances in large language models have highlighted their potential to automate computational research, particularly reproducing experimental results. However, existing approaches still use fixed sequential agent pipelines with weak global coordination, which limits their robustness and overall performance. In this work, we propose Hierarchical Research Agent System (HiRAS), a hierarchical multi-agent framework for end-to-end experiment reproduction that employs supervisory manager agents to coordinate specialised agents across fine-grained stages. We also identify limitations in the reference-free evaluation of the Paper2Code benchmark and introduce Paper2Code-Extra (P2C-Ex), a refined protocol that incorporates repository-level information and better aligns with the original reference-based metric. We conduct extensive evaluation, validating the effectiveness and robustness of our proposed methods, and observing improvements, including >10\% relative performance gain beyond the previous state-of-the-art using open-source backbone models and significantly reduced hallucination in evaluation. Our work is available on GitHub: https://github.com/KOU-199024/HiRAS.