Search papers, labs, and topics across Lattice.
The paper introduces the BC Protocol, a structured dual-expert dialogue method, to address the bottleneck of high-quality chain-of-thought (CoT) data in LLM post-training. This protocol pairs a domain expert with a knowledge engineer to externalize the expert's implicit judgments as natural language reasoning chains, overcoming limitations of crowdsourcing, solo expert writing, and RLHF. Experimental results in the narrative fiction domain demonstrate that CoT produced by the BC Protocol significantly outperforms independently written CoT by domain experts in terms of "naturalness of reasoning process," as judged by GPT-4o, Claude Opus 4.5, and Gemini 2.5 Pro.
Domain experts' unaided reasoning explanations are unnaturally bad, but pairing them with a knowledge engineer unlocks dramatically more natural chain-of-thought data for LLM post-training.
High-quality expert chain-of-thought (CoT) data is one of the core bottlenecks in large language model (LLM) post-training. Existing data production methods each have structural limitations: crowdsourced annotation lacks deep reasoning paths; expert solo writing is constrained by the "expert blind spot" -- experts structurally skip reasoning steps they consider obvious; RLHF only produces preference signals rather than reasoning chains. This paper proposes the BC Protocol -- a structured dual-expert elicitation method for LLM post-training data production. The method carefully pairs a domain expert (crystallized intelligence) with a knowledge engineer (fluid intelligence), systematically externalizing the expert's implicit judgments as natural language reasoning chains. We introduce the Participant Aptitude Model, which defines six participant characteristic dimensions that affect elicitation quality. "Calibrated Ignorance" is an original concept proposed in this paper. We further propose "Selection-over-Prescription" as a methodological principle: for implicit knowledge elicitation tasks, investing quality-control resources in personnel selection yields a higher return than investing the same resources in process design. In a controlled experiment in the narrative fiction domain, we directly compared CoT produced by BC Protocol dual dialogue (Group A, (n=20)) against CoT written independently by the same domain expert (Group B, (n=20)). Three cross-vendor judge models -- GPT-4o, Claude Opus 4.5, and Gemini 2.5 Pro -- conducted blind evaluation across five dimensions (600 ratings total). Results show that the BC Protocol achieves an overwhelming advantage in "naturalness of reasoning process" (Group A mean 4.80 vs. Group B mean 1.30, (p=2.4\times10^{-8}), Cliff's (未=1.0)).