Search papers, labs, and topics across Lattice.
LLM safety probes can be made significantly more robust to adversarial attacks by requiring consistent evidence across token segments, not just isolated spikes.
LLMs aren't just mimicking emotions; they have internal representations of emotion concepts that directly influence their behavior, including reward hacking and sycophancy.