Ben-Gurion University of the NegevFeb 19, 2026arXiv:2602.17206

SoftDTW-CUDA-Torch: Memory-Efficient GPU-Accelerated Soft Dynamic Time Warping for PyTorch

AI Summary

The paper introduces softdtw-cuda-torch, a PyTorch library for GPU-accelerated Soft Dynamic Time Warping (SoftDTW) that overcomes limitations of existing implementations. It achieves this by using tiled anti-diagonal kernel execution to remove sequence length constraints, a log-space backward pass for numerical stability, and a fused distance computation mode to reduce memory consumption. The library demonstrates up to 98% memory reduction compared to prior work while supporting arbitrary sequence lengths and full PyTorch autograd integration.

Key Contribution

SoftDTW on GPUs just got a whole lot faster and memory-efficient, unlocking applications to longer sequence analysis thanks to a 98% memory reduction.

Abstract

We present softdtw-cuda-torch, an open-source PyTorch library for computing Soft Dynamic Time Warping (SoftDTW) on GPUs. Our implementation addresses three key limitations of existing GPU implementations of SoftDTW: a hard sequence-length cap of 1024, numerical instability in the backward pass for small smoothing parameters, and excessive GPU memory consumption from materializing pairwise distance tensors. We introduce (1) tiled anti-diagonal kernel execution that removes the sequence-length constraint, (2) a log-space back-ward pass that prevents floating-point overflow, and (3) a fused distance-computation mode that eliminates the O(BN M ) intermediate distance tensor, achieving up to 98% memory reduction compared to prior work. The library supports arbitrary sequence lengths, full PyTorch autograd integration, and Soft-DTW Barycenter computation. Code is available at https://github.com/BGU-CS-VIL/sdtw-cuda-torch.

Distributed Systems & Hardware Open-Source Models & Weights Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SoftDTW-CUDA-Torch: Memory-Efficient GPU-Accelerated Soft Dynamic Time Warping for PyTorch

Related Papers