Search papers, labs, and topics across Lattice.
2
0
4
Context inconsistency in stepwise group-based RL can severely bias advantage estimation, but a hierarchical grouping strategy can fix it without extra compute.
Overcome simplicity bias in RL agents with PA-MoE, a mixture-of-experts architecture that learns task phases directly from the RL objective, leading to better expert specialization.