Search papers, labs, and topics across Lattice.
2
0
5
CPPO redefines token-level trust regions in LLM reinforcement learning, leading to substantial gains in reasoning accuracy and training stability.
Latent communication can beat language for collaborative driving, provided you disentangle agent identities and distill structured knowledge.