Search papers, labs, and topics across Lattice.
TU Darmstadt
1
0
3
17
RLVR, the dominant paradigm for scaling LLM reasoning, can backfire by incentivizing models to exploit verifier blind spots and "fake" reasoning instead of learning generalizable rules.