Search papers, labs, and topics across Lattice.
This paper introduces "conceptors," soft projection matrices estimated from LLM activations, as a geometrically principled alternative to single-direction activation steering. Conceptors capture the full multidimensional subspace of a concept, outperforming single-vector baselines and enabling a closed-form Boolean algebra for concept composition. The "conceptor quota" is also introduced as a parameter-free diagnostic for concept separability, achieving high correlations (up to r=0.96) with observed performance.
Steering LLMs with conceptors鈥攕oft projection matrices capturing the full semantic subspace鈥攜ields more robust control and enables Boolean logic for composing concepts, moving beyond the limitations of single-vector steering.
Activation-based steering provides control of LLM behavior at inference time, but the dominant paradigm reduces each concept to a single direction whose geometry is left largely unexamined. Rather than selecting a single steering direction, we use conceptors: soft projection matrices estimated from activations pooled across both poles of a bipolar concept, which preserve the concept's full multidimensional subspace. A geometric analysis shows the bipolar subspace strictly subsumes the single-vector baseline. We further show that the conceptor quota provides a parameter-free layer-selection diagnostic, predicting concept separability with Pearson correlations up to r=0.96 across three instruction-tuned models and three semantic dimensions. Beyond selection, conceptors admit a closed-form Boolean algebra (AND, OR, NOT): we evaluate conceptor compositionality on thematically related sub-concepts. Across a systematic five-axis design-space evaluation, conceptors match or outperform additive baselines at layers where concept subspaces are multi-dimensional while producing substantially fewer degenerate outputs. Conceptor steering is a geometrically principled, compositional, and practically safer alternative to single-direction steering from a limited number of contrastive pairs.