Search papers, labs, and topics across Lattice.
The paper introduces MOSAIC, a multi-objective framework for optimizing supervised fine-tuning data mixtures to balance safety alignment, over-refusal, and instruction following. MOSAIC iteratively curates data by analyzing slice-level failure profiles and translating them into actionable data manipulations like adjusting dataset mixture ratios and bucket-level weights. Experiments show that MOSAIC significantly improves safety scores (XGuard) while maintaining performance on over-refusal (OrBench) and instruction following (IFEval), and also generalizes better than static baselines.
Forget random data mixing: MOSAIC uses failure analysis to intelligently curate training data, leading to better safety, less over-refusal, and improved instruction following, all at once.
We study how to allocate a fixed supervised fine-tuning budget when three objectives must be balanced at once: multi-turn safety alignment, low over-refusal on benign boundary queries, and instruction following under verifiable constraints. We propose MOSAIC (Multi-Objective Slice-Aware Iterative Curation for Alignment), a multi-objective framework for closed-loop data mixture search built on a unified L1-L3 evaluation interface. MOSAIC turns slice-level failure profiles into executable data actions, including dataset-level mixture ratios, bucket-level weights, and focus criteria. Under a fixed 1M-token budget and five rounds of independent fine-tuning from the same base model, MOSAIC improves internal XGuard from 2.76 to 4.67 while keeping OrBench at 4.41 and IFEval at 3.65. The final Pareto solution also generalizes better than a random static LoRA baseline on independent attack, over-refusal, and capability tests, suggesting that structured failure diagnosis can serve as a practical control signal for budgeted data construction. Code is available at https://github.com/douyipu/mosaic.