Search papers, labs, and topics across Lattice.
Kyoto University, NII LLMC
1
0
3
RL models trained with verifiable rewards exhibit a surprising deductive-over-abductive reasoning asymmetry, even in controlled environments, suggesting a fundamental challenge in current RLVR approaches.