Search papers, labs, and topics across Lattice.
5
34
11
23
AI agents are shockingly easy to manipulate into leaking API keys, deleting user data, and initiating unauthorized transactions across a wide range of real-world applications.
Turns out, coding agents in the wild are only writing useful code 44% of the time, and are introducing more security vulnerabilities than human developers.
Current Large Audio Language Models (LALMs) struggle with basic audio understanding tasks like noise localization and cross-lingual speech, with some performing worse than random chance, despite excelling at speech recognition.
Uncover hidden incentives in your reward model: Obj-Disco automatically decomposes alignment rewards into human-interpretable objectives, revealing potential misalignments you might have missed.
Chatbot Arena, the go-to LLM leaderboard, is systematically gamed by undisclosed private testing and data access advantages, leading to biased rankings and overfitting.