Mar 30, 2026arXiv:2603.27998

BiFormer3D: Grid-Free Time-Domain Reconstruction of Head-Related Impulse Responses with a Spatially Encoded Transformer

Shaoheng Xu, Chunyi Sun, J. Zhang, Jihui Zhang, Amy Bastine, Prasanga N. Samarasinghe, P. Samarasinghe, Thushara D. Abhayapala, T. Abhayapala, Hongdong Li

AI Summary

The paper introduces BiFormer3D, a novel Transformer-based architecture for reconstructing Head-Related Impulse Responses (HRIRs) at arbitrary spatial locations from sparse measurements, operating directly in the time domain. By using sinusoidal spatial encodings and auxiliary ITD/ILD prediction heads, BiFormer3D avoids the limitations of frequency-domain methods and fixed direction grids. Experiments on the SONICOM dataset demonstrate improved performance in NMSE, cosine distance, and ITD/ILD error compared to existing techniques, while also showing that minimum-phase preprocessing is unnecessary.

Key Contribution

Ditch the grid: BiFormer3D uses a spatial-encoding Transformer to reconstruct personalized 3D audio from sparse measurements, outperforming prior art without relying on frequency-domain hacks or minimum-phase assumptions.

Abstract

Individualized head-related impulse responses (HRIRs) enable binaural rendering, but dense per-listener measurements are costly. We address HRIR spatial up-sampling from sparse per-listener measurements: given a few measured HRIRs for a listener, predict HRIRs at unmeasured target directions. Prior learning methods often work in the frequency domain, rely on minimum-phase assumptions or separate timing models, and use a fixed direction grid, which can degrade temporal fidelity and spatial continuity. We propose BiFormer3D, a time-domain, grid-free binaural Transformer for reconstructing HRIRs at arbitrary directions from sparse inputs. It uses sinusoidal spatial features, a Conv1D refinement module, and auxiliary interaural time difference (ITD) and interaural level difference (ILD) heads. On SONICOM, it improves normalized mean squared error (NMSE), cosine distance, and ITD/ILD errors over prior methods; ablations validate modules and show minimum-phase pre-processing is unnecessary.

Architecture Design (Transformers, SSMs, MoE)Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

BiFormer3D: Grid-Free Time-Domain Reconstruction of Head-Related Impulse Responses with a Spatially Encoded Transformer

Related Papers