Search papers, labs, and topics across Lattice.
The paper introduces Feature Activation Coverage (FAC), a metric for quantifying data diversity in the feature space of LLMs, and FAC Synthesis, a framework that uses sparse autoencoders to identify and synthesize data reflecting missing features in a seed dataset. By optimizing for feature diversity rather than text-based diversity, the approach improves downstream performance across instruction following, toxicity detection, reward modeling, and behavior steering tasks. The authors demonstrate the existence of a shared, interpretable feature space across LLaMA, Mistral, and Qwen, enabling cross-model knowledge transfer.
Forget chasing massive datasets: synthesizing data to fill gaps in your LLM's feature space can boost performance across diverse tasks and even transfer knowledge between model families.
The diversity of post-training data is critical for effective downstream performance in large language models (LLMs). Many existing approaches to constructing post-training data quantify diversity using text-based metrics that capture linguistic variation, but such metrics provide only weak signals for the task-relevant features that determine downstream performance. In this work, we introduce Feature Activation Coverage (FAC) which measures data diversity in an interpretable feature space. Building upon this metric, we further propose a diversity-driven data synthesis framework, named FAC Synthesis, that first uses a sparse autoencoder to identify missing features from a seed dataset, and then generates synthetic samples that explicitly reflect these features. Experiments show that our approach consistently improves both data diversity and downstream performance on various tasks, including instruction following, toxicity detection, reward modeling, and behavior steering. Interestingly, we identify a shared, interpretable feature space across model families (i.e., LLaMA, Mistral, and Qwen), enabling cross-model knowledge transfer. Our work provides a solid and practical methodology for exploring data-centric optimization of LLMs.