Search papers, labs, and topics across Lattice.
This paper investigates the impact of different prompt architecture layers on the performance of Claude 3.5 Sonnet on the "car wash problem," a benchmark requiring implicit physical constraint inference. Through a variable isolation study, the authors demonstrate that the STAR reasoning framework significantly improves accuracy from 0% to 85%. Further gains are achieved by incorporating user profile context via vector database retrieval and RAG context, ultimately reaching 100% accuracy, highlighting the importance of structured reasoning scaffolds over context injection.
Forget fancy RAG pipelines: forcing LLMs to articulate the goal before reasoning about constraints is the real secret to solving the "car wash problem."
Large language models consistently fail the"car wash problem,"a viral reasoning benchmark requiring implicit physical constraint inference. We present a variable isolation study (n=20 per condition, 6 conditions, 120 total trials) examining which prompt architecture layers in a production system enable correct reasoning. Using Claude 3.5 Sonnet with controlled hyperparameters (temperature 0.7, top_p 1.0), we find that the STAR (Situation-Task-Action-Result) reasoning framework alone raises accuracy from 0% to 85% (p=0.001, Fisher's exact test, odds ratio 13.22). Adding user profile context via vector database retrieval provides a further 10 percentage point gain, while RAG context contributes an additional 5 percentage points, achieving 100% accuracy in the full-stack condition. These results suggest that structured reasoning scaffolds -- specifically, forced goal articulation before inference -- matter substantially more than context injection for implicit constraint reasoning tasks.