Search papers, labs, and topics across Lattice.
TREX, a multi-agent system, automates the LLM training lifecycle by orchestrating a Researcher agent for strategy formulation and an Executor agent for training and evaluation. It models the experimental process as a search tree to enable efficient exploration, reuse of past results, and distillation of insights. Evaluated on FT-Bench, a new benchmark of 10 real-world fine-tuning tasks, TREX consistently optimizes model performance.
Automating LLM fine-tuning is now possible: a multi-agent system, TREX, matches or exceeds human performance on a diverse set of real-world tasks.
While Large Language Models (LLMs) have empowered AI research agents to perform isolated scientific tasks, automating complex, real-world workflows, such as LLM training, remains a significant challenge. In this paper, we introduce TREX, a multi-agent system that automates the entire LLM training life-cycle. By orchestrating collaboration between two core modules-the Researcher and the Executor-the system seamlessly performs requirement analysis, open-domain literature and data research, formulation of training strategies, preparation of data recipes, and model training and evaluation. The multi-round experimental process is modeled as a search tree, enabling the system to efficiently plan exploration paths, reuse historical results, and distill high-level insights from iterative trials. To evaluate the capability of automated LLM training, we construct FT-Bench, a benchmark comprising 10 tasks derived from real-world scenarios, ranging from optimizing fundamental model capabilities to enhancing performance on domain-specific tasks. Experimental results demonstrate that the TREX agent consistently optimizes model performance on target tasks.