Search papers, labs, and topics across Lattice.
This paper addresses the challenges of automated classification of echocardiographic views by introducing the largest publicly available dataset, the Echocardiographic Videos of Nine Views (EV9V), which consists of over 5,000 videos and nearly a million frames. The authors benchmark various video classification architectures and propose a novel Spatio-Temporal Fusion Model (STFM) that integrates CNN and LSTM to effectively capture both spatial and temporal features while managing frame quality variations. Results indicate that STFM significantly enhances classification performance, showcasing the potential of uncertainty-aware learning in medical video analysis.
The introduction of the EV9V dataset and STFM could revolutionize echocardiographic view classification, achieving superior performance through innovative spatio-temporal learning techniques.
Automated classification of standard echocardiographic views is crucial for efficient clinical workflow but faces three main challenges. First, publicly available datasets are scarce and limited in scale and view coverage. Second, the performance of some modern video-level architectures for echocardiographic view classification remains underexplored. Third, some view categories exhibit highly similar spatial appearances, making single-frame features insufficient for discrimination, while heterogeneous frame quality complicates robust temporal information fusion. To address these challenges, we release the Echocardiographic Videos of Nine Views (EV9V) dataset, comprising 5,138 videos, 910,579 frames, and 9 standard views, which is, to the best of our knowledge, the largest publicly available echocardiography video dataset. Using EV9V, we systematically benchmark representative video classification architectures, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers. Furthermore, we propose a Spatio-Temporal Fusion Model (STFM), an efficient dual-stream CNN-LSTM (Long Short-Term Memory) framework that jointly captures spatial anatomical structures and temporal cardiac dynamics. The proposed framework leverages uncertainty-aware learning to preferentially sample representative video segments during training and evidence-based fusion during inference, improving robustness to variations in frame quality across echocardiographic videos. Extensive experiments demonstrate that our method achieves competitive performance across diverse video classification models, validating the effectiveness of uncertainty-aware spatio-temporal learning for echocardiographic view classification. The code is available at https://github.com/bgx666/stfm.