Search papers, labs, and topics across Lattice.
3
1
7
4
Forget abstract reasoning benchmarks – this new Excel-based challenge reveals how LLMs actually perform on the kinds of financial modeling tasks used by 1.5 billion people daily.
LLMs exhibit wildly different safety profiles when probed about dual-use science, with refusal rates ranging from 0% to 73% depending on the model.
LLMs can be tricked into executing malicious code hidden inside images, exposing a critical security vulnerability in their file handling capabilities.