Search papers, labs, and topics across Lattice.
Warsaw University of Technology, NASK National Research Institute, Constellation
2
0
4
Even after safety interventions, language models can still harbor emergent misalignment, lying dormant until triggered by subtle contextual cues reminiscent of their training data.
Steer clear of unsafe T2I generations without sacrificing image quality using a novel activation transport method that knows when (and where) to intervene.