Search papers, labs, and topics across Lattice.
Technische Universität Berlin, Technical University of Berlin, Germany
2
0
5
Forget inspecting final outputs: LLMs telegraph their reward-hacking intentions internally, early in the generation process, via distinctive activation patterns.
Off-the-shelf root cause analysis tools fall flat when applied to LLM inference stacks, demanding a new generation of observability techniques.