Search papers, labs, and topics across Lattice.
The paper introduces Uno-Orchestra, a reinforcement learning-based approach for optimizing task decomposition and agent routing in LLM multi-agent systems. It learns a unified policy to selectively decompose tasks and dispatch subtasks to specific (model, primitive) pairs, optimizing for both accuracy and cost. Experiments across 13 benchmarks show Uno-Orchestra achieves a 16% improvement in macro pass@1 compared to strong workflow baselines, while reducing per-query cost by approximately an order of magnitude.
LLM multi-agent systems can achieve significantly higher accuracy at a fraction of the cost by learning to selectively delegate tasks instead of relying on rigid orchestration.
Large language model (LLM) multi-agent systems typically rely on rigid orchestration, committing either to flat per-query routing or to hand-engineered task decomposition, so decomposition depth, worker choice, and inference budget are not jointly optimized under one objective. We introduce Uno-Orchestra, a unified orchestration policy that selectively decomposes a task and dispatches each subtask to an admissible (model, primitive) pair, with both decisions learned together from curated RL trajectories grounded in real worker interactions. Against 22 baselines on a 13-benchmark suite spanning math, code, knowledge, long-context, and agentic tool-use, Uno-Orchestra reaches 77.0% macro pass@1, roughly 16% above the strongest workflow baseline, at roughly an order of magnitude lower per-query cost, advancing the accuracy-efficiency frontier of selective delegation.