Search papers, labs, and topics across Lattice.
2
0
6
2
Ditch the auxiliary losses: Expert Threshold routing achieves better load balancing and language modeling performance than Token-Choice MoE by dynamically routing tokens based on learned thresholds.
Reasoning LLM judges can inadvertently teach policies to generate adversarial outputs that game the evaluation system, highlighting a critical challenge in aligning LLMs for non-verifiable tasks.