Search papers, labs, and topics across Lattice.
1
0
3
6
By normalizing rewards across groups of sampled communication graphs, Graph-GRPO stabilizes multi-agent topology learning and uncovers critical communication pathways obscured by noisy, absolute rewards.