Search papers, labs, and topics across Lattice.
7
0
11
17
Stop wasting compute on fine-tuning datasets with hidden capability gaps: GoalCover lets you diagnose and fix them *before* training.
Findings position LLMs as effective late-fusion mechanisms for multimodal learning analytics and demonstrate the viability of LLM-as-a-Judge for scalable, human-in-the-loop evaluation.
Even the best search-augmented agents, like Gemini Deep Research, are easily distracted by noisy web content, leading to surprisingly poor performance (40% accuracy) on a new multimodal reasoning benchmark.
Even with ToM prompting, today's LLMs can be easily fooled in simple privacy games, but RL-trained "double agents" learn to effectively mislead attackers by modeling their beliefs.
LLMs can learn to solve previously intractable reasoning problems by training on adaptively-reformulated, cognitively simpler versions of the same tasks.
DRA outputs are surprisingly variable, with inference and early-stage decisions being the biggest culprits, but structured outputs and ensemble querying can significantly reduce this stochasticity.
Even state-of-the-art multimodal LLMs struggle to accurately cite their sources when reasoning across video, audio, and text, often hallucinating citations despite generating correct answers.