Search papers, labs, and topics across Lattice.
1
0
3
25
RLVR, the dominant paradigm for scaling LLM reasoning, can backfire by incentivizing models to exploit verifier blind spots and "fake" reasoning instead of learning generalizable rules.