Search papers, labs, and topics across Lattice.
The paper addresses limitations in synthetic population generation for agent-based models by proposing a Wasserstein Generative Adversarial Network (WGAN) that jointly synthesizes data from multiple sources. A novel inverse gradient penalty regularization term is introduced to the generator loss function to improve the diversity and feasibility of the generated synthetic data by addressing sampling and structural zeros. Experiments demonstrate that the proposed joint approach outperforms sequential methods, achieving a 7% increase in recall and 15% increase in precision, with the regularization term further boosting diversity and feasibility.
Forget sequential data fusion: a new WGAN-based approach synthesizes more diverse and feasible synthetic populations by jointly learning from multiple datasets, outperforming existing methods by up to 15% in precision.
Generating realistic synthetic populations is essential for agent-based models (ABM) in transportation and urban planning. Current methods face two major limitations. First, many rely on a single dataset or follow a sequential data fusion and generation process, which means they fail to capture the complex interplay between features. Second, these approaches struggle with sampling zeros (valid but unobserved attribute combinations) and structural zeros (infeasible combinations due to logical constraints), which reduce the diversity and feasibility of the generated data. This study proposes a novel method to simultaneously integrate and synthesize multi-source datasets using a Wasserstein Generative Adversarial Network (WGAN) with gradient penalty. This joint learning method improves both the diversity and feasibility of synthetic data by defining a regularization term (inverse gradient penalty) for the generator loss function. For the evaluation, we implement a unified evaluation metric for similarity, and place special emphasis on measuring diversity and feasibility through recall, precision, and the F1 score. Results show that the proposed joint approach outperforms the sequential baseline, with recall increasing by 7\% and precision by 15\%. Additionally, the regularization term further improves diversity and feasibility, reflected in a 10\% increase in recall and 1\% in precision. We assess similarity distributions using a five-metric score. The joint approach performs better overall, and reaches a score of 88.1 compared to 84.6 for the sequential method. Since synthetic populations serve as a key input for ABM, this multi-source generative approach has the potential to significantly enhance the accuracy and reliability of ABM.