Search papers, labs, and topics across Lattice.
This paper explores the use of Tabular Variational Autoencoders (TVAE) and Gaussian Copula (GC) models for generating synthetic flight data to address data scarcity and confidentiality issues in aviation research. The authors evaluate the generated data using a four-stage assessment framework, considering statistical similarity, fidelity, diversity, and predictive utility. They find that while GC offers higher statistical similarity, TVAE is more efficient for large datasets, and both models can produce synthetic data capable of training flight delay prediction models with comparable accuracy to real data.
Synthetic flight data generated by generative models can train flight delay prediction models with accuracy comparable to those trained on real data, unlocking new possibilities for predictive modeling in air transportation.
The increasing adoption of synthetic data in aviation research offers a promising solution to data scarcity and confidentiality challenges. This study investigates the potential of generative models to produce realistic synthetic flight data and evaluates their quality through a comprehensive four-stage assessment framework. The need for synthetic flight data arises from their potential to serve as an alternative to confidential real-world records and to augment rare events in historical datasets. These enhanced datasets can then be used to train machine learning models that predict critical events, such as flight delays, cancellations, diversions, and turnaround times. Two generative models, Tabular Variational Autoencoder (TVAE) and Gaussian Copula (GC), are adapted to generate synthetic flight information and compared based on their ability to preserve statistical similarity, fidelity, diversity, and predictive utility. Results indicate that while GC achieves higher statistical similarity and fidelity, its computational cost hinders its applicability to large datasets. In contrast, TVAE efficiently handles large datasets and enables scalable synthetic data generation. The findings demonstrate that synthetic data can support flight delay prediction models with accuracy comparable to those trained on real data. These results pave the way for leveraging synthetic flight data to enhance predictive modeling in air transportation.