Search papers, labs, and topics across Lattice.
University College London
2
0
2
Steering neural networks through the intrinsic geometry of their activations unlocks more natural and controllable behaviors than traditional linear interventions.
Sparse autoencoders, despite their popularity for extracting interpretable features, often fail to capture the underlying manifold structure of concepts, instead fragmenting them across multiple, diluted features.