Search papers, labs, and topics across Lattice.
2
0
3
Open-weight Omni models struggle with binding accuracy, achieving only 41.55% on a new counterfactual benchmark, highlighting a critical gap in long-video comprehension.
Current research agents still struggle with retrieval robustness and hallucination control, even when evaluated in a static, verifiable research environment.