Search papers, labs, and topics across Lattice.
This paper investigates the application of reinforcement learning (RL) to radiology report generation (R2G), focusing on improving data efficiency and optimization effectiveness. They find that data quality is more important than quantity in medical RL and propose a diagnostic diversity-based data sampling strategy. To address the issue of clinically critical tokens being overlooked, they introduce Diagnostic Token-weighted Policy Optimization (DiTPO), which optimizes for clinical accuracy by weighting tokens based on their diagnostic importance. The proposed framework achieves state-of-the-art performance on multiple datasets while using significantly fewer training samples.
RL for radiology report generation can achieve state-of-the-art results with significantly less data by prioritizing data quality and weighting tokens based on their diagnostic importance.
Radiologists highly desire fully automated AI for radiology report generation (R2G), yet existing approaches fall short in clinical utility. Reinforcement learning (RL) holds potential to address these shortcomings, but its adoption in this task remains underexplored. In this paper, we revisit RL in terms of data efficiency and optimization effectiveness for R2G tasks. First, we explore the impact of data quantity and quality on the performance of RL in medical contexts, revealing that data quality plays a more critical role than quantity. To this end, we propose a diagnostic diversity-based data sampling strategy that enables comparable performance with fewer samples. Second, we observe that the majority of tokens in radiology reports are template-like and diagnostically uninformative, whereas the low frequency of clinically critical tokens heightens the risk of being overlooked during optimization. To tackle this, we introduce Diagnostic Token-weighted Policy Optimization (DiTPO), which directly optimizes for clinical accuracy by using a diagnostic F1 score as the reward signal. Unlike standard RL approaches that treat all tokens equally, DiTPO explicitly models the varying importance of different tokens through rule- or gradient-based mechanisms to prioritize clinically relevant content. Extensive experiments on the MIMIC-CXR, IU-Xray, and CheXpert Plus datasets demonstrate that our framework achieves state-of-the-art (SOTA) performance while requiring substantially fewer training samples in RL. Notably, on MIMIC-CXR, our framework attains an F1 score of 0.516 using only 20% of the RL training samples.