Search papers, labs, and topics across Lattice.
This paper addresses the challenge of rapid plume shine dose estimation for nuclear safety by developing an interpolation-assisted machine learning framework using pyDOSEIA-generated data for various radionuclides and atmospheric conditions. The authors augmented sparse datasets with shape-preserving interpolation to create high-resolution training data and compared the performance of XGBoost, Random Forest, and TabNet models. Results showed that XGBoost, leveraging its focus on key geometry-dispersion features, achieved the highest prediction accuracy, leading to the development of a web-based GUI for interactive dose assessment.
Interpolating sparse physics simulation data can dramatically improve the accuracy of ML surrogates for radiation dose estimation, with XGBoost outperforming deep learning alternatives.
Despite the success of machine learning (ML) in surrogate modeling, its use in radiation dose assessment is limited by safety-critical constraints, scarce training-ready data, and challenges in selecting suitable architectures for physics-dominated systems. Within this context, rapid and accurate plume shine dose estimation serves as a practical test case, as it is critical for nuclear facility safety assessment and radiological emergency response, while conventional photon-transport-based calculations remain computationally expensive. In this work, an interpolation-assisted ML framework was developed using discrete dose datasets generated with the pyDOSEIA suite for 17 gamma-emitting radionuclides across varying downwind distances, release heights, and atmospheric stability categories. The datasets were augmented using shape-preserving interpolation to construct dense, high-resolution training data. Two tree-based ML models (Random Forest and XGBoost) and one deep learning (DL) model (TabNet) were evaluated to examine predictive performance and sensitivity to dataset resolution. All models showed higher prediction accuracy with the interpolated high-resolution dataset than with the discrete data; however, XGBoost consistently achieved the highest accuracy. Interpretability analysis using permutation importance (tree-based models) and attention-based feature attribution (TabNet) revealed that performance differences stem from how the models utilize input features. Tree-based models focus mainly on dominant geometry-dispersion features (release height, stability category, and downwind distance), treating radionuclide identity as a secondary input, whereas TabNet distributes attention more broadly across multiple variables. For practical deployment, a web-based GUI was developed for interactive scenario evaluation and transparent comparison with photon-transport reference calculations.