Shanghai Institute of AI for EducationMay 28, 2026arXiv:2605.30227

Unifying Temporal and Structural Credit Assignment in LLM-Based Multi-Agent Prompt Optimization

Wenwu Li, Yu Song, Min Zhao, Bo Jin, Wenhao Li

AI Summary

This paper tackles the challenge of optimizing multi-agent systems (MAS) powered by LLMs for complex reasoning, where the discrete nature of interactions and sparse feedback hinder effective training. They introduce temporal and structural credit assignment, which decomposes the optimization objective by identifying critical interaction rounds and isolating individual agent contributions based on stationary role policies. The method then uses a discrete block coordinate descent algorithm with LLM-generated proxy gradients to iteratively refine role prompts and aggregation protocols, targeting only the identified weak links.

Key Contribution

LLM-based multi-agent systems can be optimized far more efficiently by decomposing credit assignment temporally and structurally, pinpointing weak links for targeted refinement.

Abstract

While Multi-Agent Systems (MAS) empower Large Language Models to tackle complex reasoning tasks through collaborative interaction, optimizing their dynamics remains a formidable challenge due to the discrete, non-differentiable nature of the computation graph and the sparsity of global supervisory signals. Existing black-box optimizers struggle to attribute trajectory-level failure to specific local components, resulting in inefficient, high-variance exploration. We argue that tractable MAS optimization needs structural inductive biases to disentangle error signals. We propose temporal and structural credit assignment, which decomposes the objective along two axes: (i) temporal credit, using state-space bottlenecks to identify critical rounds, and (ii) structural credit, using stationary role policies to isolate agent contributions. Leveraging these decomposed signals, we introduce a discrete, verbalized block coordinate descent algorithm for iterative refinement. Rather than indiscriminate global updates, it alternates between optimizing role prompts and aggregation protocols, using LLM-generated"proxy gradients"to target only the identified weak links. Across diverse reasoning benchmarks, our approach substantially reduces query complexity while improving performance, providing a principled and interpretable path toward self-improving MAS.

Reasoning & Chain-of-Thought Tool Use & Agents Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References31

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Unifying Temporal and Structural Credit Assignment in LLM-Based Multi-Agent Prompt Optimization

Related Papers