Search papers, labs, and topics across Lattice.
Zhejiang University
2
0
4
10
Even frontier models like Claude Sonnet 4.6 stumble when asked to infer user preferences and proactively assist in mobile tasks, achieving less than 50% success despite excelling at explicit task execution.
LLMs can escape the trap of confidently wrong reasoning by co-evolving a generator and verifier from a single model, bootstrapping each other to break free from flawed consensus.