Search papers, labs, and topics across Lattice.
KAIST AI
4
0
9
Video-LLMs fail to effectively ground queries in hour-long videos, with a surprising 85% of their failures stemming from search issues rather than recognition.
Models can achieve similar accuracy while exhibiting starkly different reasoning failures, revealing a hidden complexity in AI performance that aggregate metrics overlook.
Leading LLMs falter in Korean web-browsing tasks, achieving less than half the accuracy found in previous benchmarks.
MERIT achieves a remarkable 5.7-point improvement in benchmark performance by intelligently partitioning and merging model weights, challenging the need for centralized training.