Search papers, labs, and topics across Lattice.
Gaoling School of Artificial Intelligence
2
0
4
21
Current multimodal models are stuck in bi-modal interactions, but OmniGAIA and OmniAtlas offer a path towards truly omni-modal AI assistants capable of reasoning and tool use across video, audio, and images.
Current multimodal retrieval systems fall flat when faced with realistic visual streams where context is distributed across time, motivating a new agentic paradigm for context-aware image retrieval.