Search papers, labs, and topics across Lattice.
Nanyang Technological University, Singapore
3
0
7
Chain-of-thought prompting makes large language models smarter, but it also makes them less safe, a problem this paper tackles by forcing models to think about safety *before* reasoning.
Context inconsistency in stepwise group-based RL can severely bias advantage estimation, but a hierarchical grouping strategy can fix it without extra compute.
Overcome simplicity bias in RL agents with PA-MoE, a mixture-of-experts architecture that learns task phases directly from the RL objective, leading to better expert specialization.