Search papers, labs, and topics across Lattice.
The paper introduces softdtw-cuda-torch, a PyTorch library for GPU-accelerated Soft Dynamic Time Warping (SoftDTW) that overcomes limitations of existing implementations. It achieves this by using tiled anti-diagonal kernel execution to remove sequence length constraints, a log-space backward pass for numerical stability, and a fused distance computation mode to reduce memory consumption. The library demonstrates up to 98% memory reduction compared to prior work while supporting arbitrary sequence lengths and full PyTorch autograd integration.
SoftDTW on GPUs just got a whole lot faster and memory-efficient, unlocking applications to longer sequence analysis thanks to a 98% memory reduction.
We present softdtw-cuda-torch, an open-source PyTorch library for computing Soft Dynamic Time Warping (SoftDTW) on GPUs. Our implementation addresses three key limitations of existing GPU implementations of SoftDTW: a hard sequence-length cap of 1024, numerical instability in the backward pass for small smoothing parameters, and excessive GPU memory consumption from materializing pairwise distance tensors. We introduce (1) tiled anti-diagonal kernel execution that removes the sequence-length constraint, (2) a log-space back-ward pass that prevents floating-point overflow, and (3) a fused distance-computation mode that eliminates the O(BN M ) intermediate distance tensor, achieving up to 98% memory reduction compared to prior work. The library supports arbitrary sequence lengths, full PyTorch autograd integration, and Soft-DTW Barycenter computation. Code is available at https://github.com/BGU-CS-VIL/sdtw-cuda-torch.