Search papers, labs, and topics across Lattice.
3
40
5
22
A global consensus on AI safety risks and capabilities has emerged from a panel of 100+ independent experts, representing a landmark effort in international collaboration.
Despite progress in AI agent capabilities, reliability across crucial dimensions like consistency and robustness remains stubbornly low, revealing a critical gap in current evaluation practices.
Chatbot Arena, the go-to LLM leaderboard, is systematically gamed by undisclosed private testing and data access advantages, leading to biased rankings and overfitting.