Search papers, labs, and topics across Lattice.
2
0
4
2
Current omnimodal LLMs that ace offline benchmarks still fumble basic real-time interactions, highlighting a critical gap in their ability to handle streaming audio-visual data.
Leaderboard-topping video models are still surprisingly brittle, failing on basic video reasoning tasks unless given the right textual cues.