Search papers, labs, and topics across Lattice.
This paper empirically validates the Creative Quality metric from Calibrated Surprise by fine-tuning a small LLM on approximately 100 expert Chain-of-Thought annotations generated using the BC Protocol. The study reveals that fine-tuning on this small, high-quality dataset effectively transfers tacit expert knowledge, particularly in areas where existing alignment datasets are weak, such as audience modeling and reality-logic coverage. The authors attribute this transfer efficiency to an architectural duality within LLMs, where calibrating the "appreciation" side of the model automatically improves its generative capabilities.
Forget needing massive datasets: just 100 expert-annotated Chain-of-Thought examples can dramatically improve a small LLM's creative quality, thanks to architectural duality.
This paper provides an empirical implementation of the creative quality metric proposed in Calibrated Surprise (Zou & Xu, 2026a). The question this paper addresses is: does this mathematical claim hold at the engineering level? To make the answer as general as possible, we deliberately choose the strictest engineering conditions: low data cost and a small base model. Training data comes from approximately 100 expert chain-of-thought (CoT) annotations produced by the BC Protocol (Zou & Xu, 2026b). We also identify a data bias: most publicly available alignment datasets are skewed toward craft-related knowledge, while audience modeling and reality-logic coverage are systematically weak. We use the term Creative Quality Alignment (CQA) to describe this class of engineering methods. We also offer a supporting theoretical observation: in an LLM with a single conditional distribution architecture, calibrating the appreciation side automatically transfers to the generation side via architectural duality. This is the structural reason why ~100 CoT examples are sufficient -- not a purely empirical observation like LIMA (Zhou et al., 2023).