Search papers, labs, and topics across Lattice.
Southeast University
3
0
4
Context inconsistency in stepwise group-based RL can severely bias advantage estimation, but a hierarchical grouping strategy can fix it without extra compute.
Overcome simplicity bias in RL agents with PA-MoE, a mixture-of-experts architecture that learns task phases directly from the RL objective, leading to better expert specialization.
Multi-expert systems can suffer from *worse* performance than single-expert systems due to an inherent underfitting problem that arises from the difficulty of identifying the correct expert to defer to.