Search papers, labs, and topics across Lattice.
1
0
2
Current multimodal LLMs struggle to count objects and ground evidence in videos longer than 30 minutes, achieving only ~25% accuracy compared to human performance on a new benchmark.