Search papers, labs, and topics across Lattice.
3
0
5
Even the best multimodal LLMs are surprisingly bad at understanding and remembering the "self" in egocentric video, lagging human performance by 40-50% on personalized question answering.
Current research agent benchmarks miss critical flaws, as MiroEval reveals that process quality is a reliable predictor of research outcome, and multimodal tasks expose weaknesses invisible to output-level metrics.
MiroFlow leapfrogs existing LLM agent frameworks with its agent graph architecture, delivering state-of-the-art performance and robust execution across a diverse range of benchmarks.