Search papers, labs, and topics across Lattice.
6
0
8
0
Even state-of-the-art vision-language models frequently lie and hallucinate when playing social deduction games, raising serious questions about their reliability in real-world applications requiring grounded reasoning.
Today's best AI agents can only solve 55% of real-world academic tasks that university students find challenging, revealing a significant gap between current AI capabilities and the demands of academic workflows.
LLMs can leapfrog state-of-the-art scientific algorithms and human-designed solutions, but only if you scale the evaluation loop, not just the model.
CGRA performance jumps by 2.7x thanks to NEURA, a compilation framework that elegantly transforms control flow into dataflow.
LLMs struggle with conflicting medical evidence, but a clever two-stage agentic approach can reconcile discordant signals while preserving patient privacy.
LLMs can't even reproduce published physics papers end-to-end, with the best model scoring only 34% on a new benchmark designed for this purpose.