Search papers, labs, and topics across Lattice.
1
0
2
3
Even frontier models like Claude Sonnet 4.6 stumble when asked to infer user preferences and proactively assist in mobile tasks, achieving less than 50% success despite excelling at explicit task execution.