Search papers, labs, and topics across Lattice.
This paper introduces a Transformer-based architecture for multi-modal vehicle trajectory prediction that avoids reliance on explicit graph structures or intention labeling. The model uses two parallel tracks: one for trajectory prediction and another for intention likelihood prediction based on neighboring vehicles. Results show that this separation of spatial and trajectory generation modules improves performance, and that predicting residual offsets among K trajectories allows the model to learn an ordered group of trajectories.
Ditching hand-engineered graphs and intention labels doesn't hurt -- and may even help -- multi-modal trajectory prediction with a new Transformer architecture.
Predicting vehicle trajectories plays an important role in autonomous driving and ITS applications. Although multiple deep learning algorithms are devised to predict vehicle trajectories, their reliant on specific graph structure (e.g., Graph Neural Network) or explicit intention labeling limit their flexibilities. In this study, we propose a pure Transformer-based network with multiple modals considering their neighboring vehicles. Two separate tracks are employed. One track focuses on predicting the trajectories while the other focuses on predicting the likelihood of each intention considering neighboring vehicles. Study finds that the two track design can increase the performance by separating spatial module from the trajectory generating module. Also, we find the the model can learn an ordered group of trajectories by predicting residual offsets among K trajectories.