Search papers, labs, and topics across Lattice.
The Chinese University of Hong Kong Shenzhen
3
0
6
Coding agents struggle to create complete and engaging games, with top performers barely reaching 41.46% success in end-to-end game generation.
LLMs struggle to accurately evaluate real human reasoning in mathematics, revealing a critical evaluation gap that challenges current assessment methods.
LLMs can learn new tasks without forgetting old ones, thanks to a memory-aware replay strategy that selectively rehearses important examples.