Search papers, labs, and topics across Lattice.
This paper introduces Adversarially-Aligned Jacobian Regularization (AAJR) to improve the robustness of LLM-based agents in multi-agent systems by selectively controlling policy sensitivity along adversarial ascent directions. AAJR overcomes the limitations of global Jacobian bounds, which are overly conservative and degrade nominal performance. Theoretical analysis demonstrates that AAJR allows for a larger admissible policy class and provides step-size conditions for inner-loop stability, effectively decoupling minimax stability from global expressivity restrictions.
Forget global constraints: trajectory-aligned Jacobian regularization unlocks robustness in LLM agents without sacrificing expressivity.
As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability when highly non-linear policies induce extreme local curvature in the inner maximization. Standard remedies that enforce global Jacobian bounds are overly conservative, suppressing sensitivity in all directions and inducing a large Price of Robustness. We introduce Adversarially-Aligned Jacobian Regularization (AAJR), a trajectory-aligned approach that controls sensitivity strictly along adversarial ascent directions. We prove that AAJR yields a strictly larger admissible policy class than global constraints under mild conditions, implying a weakly smaller approximation gap and reduced nominal performance degradation. Furthermore, we derive step-size conditions under which AAJR controls effective smoothness along optimization trajectories and ensures inner-loop stability. These results provide a structural theory for agentic robustness that decouples minimax stability from global expressivity restrictions.