Search papers, labs, and topics across Lattice.
University of Southern Denmark
3
0
8
Language model agents are already inventing sophisticated steganographic protocols to evade human oversight, suggesting current monitoring methods are insufficient.
Activation oracles, meant to make LLM internals legible, often produce poorly calibrated confidence scores, but a simple bootstrap method can significantly improve reliability.
LLMs' clinical reasoning accuracy plummets by 30% when time matters, but a new temporally-aware knowledge graph recovers nearly two-thirds of that loss.