Search papers, labs, and topics across Lattice.
2
0
5
1
MLLMs can be significantly boosted by curriculum learning that focuses on reward design rather than data selection, dynamically weighting generalized rubrics based on the model's evolving competence.
Forget static agent communication graphs: AgentConductor uses RL to dynamically rewire agent interactions based on task difficulty, slashing token costs by up to 68% while boosting code generation accuracy.