Search papers, labs, and topics across Lattice.
This paper introduces an automated thematic analysis (TA) framework that iteratively refines codebooks using LLMs while maintaining full provenance tracking. The framework was evaluated on five diverse corpora, including clinical interviews, social media, and public transcripts, demonstrating superior performance compared to six baseline methods in composite quality score across four datasets. Iterative refinement significantly improved code reusability and distributional consistency, leading to better alignment with expert-annotated themes in clinical corpora.
LLMs can automate and improve thematic analysis of qualitative data, achieving expert-level alignment in clinical domains through iterative codebook refinement.
Thematic analysis (TA) is widely used in health research to extract patterns from patient interviews, yet manual TA faces challenges in scalability and reproducibility. LLM-based automation can help, but existing approaches produce codebooks with limited generalizability and lack analytic auditability. We present an automated TA framework combining iterative codebook refinement with full provenance tracking. Evaluated on five corpora spanning clinical interviews, social media, and public transcripts, the framework achieves the highest composite quality score on four of five datasets compared to six baselines. Iterative refinement yields statistically significant improvements on four datasets with large effect sizes, driven by gains in code reusability and distributional consistency while preserving descriptive quality. On two clinical corpora (pediatric cardiology), generated themes align with expert-annotated themes.