Search papers, labs, and topics across Lattice.
4
40
9
6
Automating web data integration for expert querying is now possible: SODIUM-Agent achieves a 2x accuracy boost over existing systems on a new benchmark of 105 real-world tasks.
LLM endpoints can appear "healthy" according to traditional metrics while undergoing subtle behavioral shifts detectable by monitoring output distributions, highlighting a critical gap in current reliability practices.
Contrary to claims that RLVR can handle noisy data, this work reveals that current RLVR methods still suffer significantly from data quality issues, with performance dropping 8-12% when trained on truly noisy data.
LLM agents can autonomously exploit up to 13% of real-world, critical-severity web application vulnerabilities, a sobering statistic revealed by the new CVE-Bench benchmark.