Search papers, labs, and topics across Lattice.
This paper benchmarks single-agent LLMs against two multi-agent system (MAS) architectures – subagent (parallel exploration) and agent team (expert handoffs) – for automated machine learning optimization. They find a trade-off between operational stability (subagents excel in broad, shallow searches) and theoretical deliberation (agent teams enable complex refactoring with more compute). The study uses a controlled execution-based testbed with Git worktree isolation and global memory to rigorously compare the systems under fixed time budgets.
Multi-agent systems for automated research face a fundamental trade-off: parallel exploration offers speed and stability, while expert teams unlock deeper reasoning at the cost of increased fragility.
As AI agents evolve, the community is rapidly shifting from single Large Language Models (LLMs) to Multi-Agent Systems (MAS) to overcome cognitive bottlenecks in automated research. However, the optimal multi-agent coordination framework for these autonomous agents remains largely unexplored. In this paper, we present a systematic empirical study investigating the comparative efficacy of distinct multi-agent structures for automated machine learning optimization. Utilizing a rigorously controlled, execution-based testbed equipped with Git worktree isolation and explicit global memory, we benchmark a single-agent baseline against two multi-agent paradigms: a subagent architecture (parallel exploration with post-hoc consolidation) and an agent team architecture (experts with pre-execution handoffs). By evaluating these systems under strictly fixed computational time budgets, our findings reveal a fundamental trade-off between operational stability and theoretical deliberation. The subagent mode functions as a highly resilient, high-throughput search engine optimal for broad, shallow optimizations under strict time constraints. Conversely, the agent team topology exhibits higher operational fragility due to multi-author code generation but achieves the deep theoretical alignment necessary for complex architectural refactoring given extended compute budgets. These empirical insights provide actionable guidelines for designing future autoresearch systems, advocating for dynamically routed architectures that adapt their collaborative structures to real-time task complexity.