Search papers, labs, and topics across Lattice.
2
0
3
2
Even the best search agents struggle to exceed 35% accuracy on a benchmark designed to push the limits of long-horizon reasoning.
Current multimodal LLMs struggle with UI-based reasoning, but the new UI-UX model achieves a remarkable 0.7963 accuracy on the UXBench benchmark, setting a new standard.