Search papers, labs, and topics across Lattice.
3
0
5
Agents collaborating on EinsteinArena achieved breakthroughs that surpassed previous human and AI solutions, showcasing the power of collective intelligence in scientific discovery.
Over a quarter of tasks in popular AI benchmarks contain critical flaws that distort model evaluations, and this automated auditing framework can catch them.
Despite impressive headline accuracy, today's AI chatbots exhibit alarming regional biases, near-total dependence on retrieval quality, and surprising vulnerability to subtle falsehoods in user queries when used as news intermediaries.