TempleUniversity of AustinApr 28, 2026arXiv:2604.25209

DiRe-RAPIDS: Topology-faithful dimensionality reduction at scale

AI Summary

The paper identifies a key weakness in standard dimensionality reduction techniques like UMAP and t-SNE: they tend to preserve sampling noise and distort global topology due to their local-neighborhood objectives. To address this, the authors introduce a topology-faithfulness benchmark based on noisy manifolds with known homology and tune their DiRe algorithm against it. DiRe achieves Pareto-optimal performance, matching or exceeding GPU-accelerated UMAP on classification tasks while significantly improving the recovery of topological features on stress tests and a large-scale arXiv dataset.

Key Contribution

Popular dimensionality reduction techniques like UMAP can *invent* topological structure not present in the original data, but DiRe avoids this pitfall while matching UMAP's speed and classification performance.

Abstract

Dimensionality reduction methods such as UMAP and t-SNE are central tools for visualising high-dimensional data, but their local-neighborhood objectives can preserve sampling noise while distorting global topology. We show that standard local metrics reward this noise memorisation: top-performing embeddings invent cycles and disconnected islands absent from the data. We introduce a topology-faithfulness benchmark based on noisy manifolds with known homology, tune DiRe against it, and find Pareto-optimal configurations that match or beat GPU-accelerated UMAP on classification while recovering exact first Betti numbers on stress tests. On 723K arXiv paper embeddings, DiRe preserves 3-4 times more topological structure than UMAP at comparable wall-clock.

Eval Frameworks & Benchmarks Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References7

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

DiRe-RAPIDS: Topology-faithful dimensionality reduction at scale

Related Papers