Search papers, labs, and topics across Lattice.
THU
2
0
6
LLM reasoning gets a serious upgrade with MASPO, a new RLVR method that smartly balances gradient use, probability mass, and signal reliability for faster, more robust learning.
Current multimodal agents are surprisingly bad at web browsing, achieving only 36% accuracy on a new benchmark designed to test deep, multi-modal reasoning across web pages.