Search papers, labs, and topics across Lattice.
2
0
4
7
Language models are increasingly doing their real work in the "invisible" latent space, not the tokens we see.
GRPO's struggle with exploration and difficulty adaptation in LLM reasoning stems from a previously unnoticed symmetry in its advantage estimation, which can be overcome by asymmetrically weighting correct vs. incorrect trajectories.