Search papers, labs, and topics across Lattice.
4
0
6
6
LLMs can leak training data when prompted, but they rarely do so in everyday use, revealing a critical gap in our understanding of model memorization.
Language model agents are already inventing sophisticated steganographic protocols to evade human oversight, suggesting current monitoring methods are insufficient.
Activation oracles, meant to make LLM internals legible, often produce poorly calibrated confidence scores, but a simple bootstrap method can significantly improve reliability.
LLMs can ace wine trivia, but their tasting notes and food pairings still leave much to be desired, revealing the limits of textual grounding for sensory expertise.