Search papers, labs, and topics across Lattice.
Australian National University
1
0
3
0
RL's success in boosting VLM reasoning hides a critical flaw: it crushes the model's ability to explore diverse solutions, leading to premature convergence and hindering scalability.