May 25, 2026arXiv:2605.25549

BC Protocol: Structured Dual-Expert Dialogue for Eliciting High-Quality Chain-of-Thought Post-Training Data

AI Summary

The paper introduces the BC Protocol, a structured dual-expert dialogue method, to address the bottleneck of high-quality chain-of-thought (CoT) data in LLM post-training. This protocol pairs a domain expert with a knowledge engineer to externalize the expert's implicit judgments as natural language reasoning chains, overcoming limitations of crowdsourcing, solo expert writing, and RLHF. Experimental results in the narrative fiction domain demonstrate that CoT produced by the BC Protocol significantly outperforms independently written CoT by domain experts in terms of "naturalness of reasoning process," as judged by GPT-4o, Claude Opus 4.5, and Gemini 2.5 Pro.

Key Contribution

Domain experts' unaided reasoning explanations are unnaturally bad, but pairing them with a knowledge engineer unlocks dramatically more natural chain-of-thought data for LLM post-training.

Abstract

High-quality expert chain-of-thought (CoT) data is one of the core bottlenecks in large language model (LLM) post-training. Existing data production methods each have structural limitations: crowdsourced annotation lacks deep reasoning paths; expert solo writing is constrained by the "expert blind spot" -- experts structurally skip reasoning steps they consider obvious; RLHF only produces preference signals rather than reasoning chains. This paper proposes the BC Protocol -- a structured dual-expert elicitation method for LLM post-training data production. The method carefully pairs a domain expert (crystallized intelligence) with a knowledge engineer (fluid intelligence), systematically externalizing the expert's implicit judgments as natural language reasoning chains. We introduce the Participant Aptitude Model, which defines six participant characteristic dimensions that affect elicitation quality. "Calibrated Ignorance" is an original concept proposed in this paper. We further propose "Selection-over-Prescription" as a methodological principle: for implicit knowledge elicitation tasks, investing quality-control resources in personnel selection yields a higher return than investing the same resources in process design. In a controlled experiment in the narrative fiction domain, we directly compared CoT produced by BC Protocol dual dialogue (Group A, (n=20)) against CoT written independently by the same domain expert (Group B, (n=20)). Three cross-vendor judge models -- GPT-4o, Claude Opus 4.5, and Gemini 2.5 Pro -- conducted blind evaluation across five dimensions (600 ratings total). Results show that the BC Protocol achieves an overwhelming advantage in "naturalness of reasoning process" (Group A mean 4.80 vs. Group B mean 1.30, (p=2.4\times10^{-8}), Cliff's (δ=1.0)).

Data Curation & Synthetic Data Reasoning & Chain-of-Thought RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

BC Protocol: Structured Dual-Expert Dialogue for Eliciting High-Quality Chain-of-Thought Post-Training Data

Related Papers