Search papers, labs, and topics across Lattice.
3
0
7
Language models can be tricked into strategically tanking their performance with adversarially optimized prompts, revealing a major vulnerability in evaluation reliability.
Cutting LLMs' reasoning token budget can backfire spectacularly, tanking performance even below that of models with *no* reasoning at all.
Spot poisoned LoRA adapters without running them: a weight-space analysis achieves 97% accuracy in detecting backdoors, even when the trigger is unknown.