Search papers, labs, and topics across Lattice.
2
0
5
Today's best AI agents still fail more than half the time on real-world tasks combining vision, search, and coding, revealing critical gaps in reasoning and tool use.
LLM agents can appear to reason well (high entropy) while completely ignoring the input, and mutual information is a far better metric for catching this failure.