Search papers, labs, and topics across Lattice.
Kyoto University
4
0
7
RL models trained with verifiable rewards exhibit a surprising deductive-over-abductive reasoning asymmetry, even in controlled environments, suggesting a fundamental challenge in current RLVR approaches.
Fine-grained control over reward signals unlocks significant gains in multi-trait essay scoring, outperforming standard policy optimization techniques.
Animating 4D shapes just got easier: GaussiAnimate's "Skelebones" can reanimate unseen poses with 17% better PSNR than standard methods.
By warm-starting a dynamic priority queue with seen and generated unseen visual prototypes, this CZSL method significantly mitigates distribution shift at test time, outperforming existing approaches.