Search papers, labs, and topics across Lattice.
This paper investigates the impact of training data size, illumination differences, and spatial shifts on the robustness of deep learning models for detecting raveling (aggregate loss) in asphalt pavement. They introduce RavelingArena, a benchmark dataset created by augmenting an existing dataset with controlled variations in illumination and spatial shifts, allowing for controlled experiments. Results show that increasing both the quantity and diversity of training data significantly improves model accuracy and year-to-year consistency in real-world deployments.
Simply throwing more data at your pavement distress detection model isn't enough; carefully controlling for illumination and spatial variations during training can boost accuracy by over 9% and improve long-term consistency.
Raveling, the loss of aggregates, is a major form of asphalt pavement surface distress, especially on highways. While research has shown that machine learning and deep learning-based methods yield promising results for raveling detection by classification on range images, their performance often degrades in large-scale deployments where more diverse inference data may originate from different runs, sensors, and environmental conditions. This degradation highlights the need of a more generalizable and robust solution for real-world implementation. Thus, the objectives of this study are to 1) identify and assess potential variations that impact model robustness, such as the quantity of training data, illumination difference, and spatial shift; and 2) leverage findings to enhance model robustness under real-world conditions. To this end, we propose RavelingArena, a benchmark designed to evaluate model robustness to variations in raveling detection. Instead of collecting extensive new data, it is built by augmenting an existing dataset with diverse, controlled variations, thereby enabling variation-controlled experiments to quantify the impact of each variation. Results demonstrate that both the quantity and diversity of training data are critical to the accuracy of models, achieving at least a 9.2% gain in accuracy under the most diverse conditions in experiments. Additionally, a case study applying these findings to a multi-year test section in Georgia, U.S., shows significant improvements in year-to-year consistency, laying foundations for future studies on temporal deterioration modeling. These insights provide guidance for more reliable model deployment in raveling detection and other real-world tasks that require adaptability to diverse conditions.