Search papers, labs, and topics across Lattice.
The paper introduces TeCoNeRV, a novel hypernetwork-based approach for compressing videos using implicit neural representations (INRs) that addresses limitations in memory usage, bitrate, and speed. TeCoNeRV decomposes the weight prediction task spatially and temporally using patch tubelets, employs a residual-based storage scheme to capture differences between consecutive segment representations, and introduces a temporal coherence regularization framework. Experiments on UVG, HEVC, and MCL-JCV datasets demonstrate that TeCoNeRV achieves significant improvements in PSNR (2.47dB and 5.35dB at 480p and 720p, respectively) with 36% lower bitrates and 1.5-3x faster encoding speeds compared to baseline methods, while also being the first hypernetwork approach to scale to 1080p.
TeCoNeRV achieves state-of-the-art video compression with INRs, finally making hypernetwork-based approaches practical for high-resolution videos (up to 1080p) by slashing memory overhead and bitrate.
Implicit Neural Representations (INRs) have recently demonstrated impressive performance for video compression. However, since a separate INR must be overfit for each video, scaling to high-resolution videos while maintaining encoding efficiency remains a significant challenge. Hypernetwork-based approaches predict INR weights (hyponetworks) for unseen videos at high speeds, but with low quality, large compressed size, and prohibitive memory needs at higher resolutions. We address these fundamental limitations through three key contributions: (1) an approach that decomposes the weight prediction task spatially and temporally, by breaking short video segments into patch tubelets, to reduce the pretraining memory overhead by 20$\times$; (2) a residual-based storage scheme that captures only differences between consecutive segment representations, significantly reducing bitstream size; and (3) a temporal coherence regularization framework that encourages changes in the weight space to be correlated with video content. Our proposed method, TeCoNeRV, achieves substantial improvements of 2.47dB and 5.35dB PSNR over the baseline at 480p and 720p on UVG, with 36% lower bitrates and 1.5-3$\times$ faster encoding speeds. With our low memory usage, we are the first hypernetwork approach to demonstrate results at 480p, 720p and 1080p on UVG, HEVC and MCL-JCV. Our project page is available at https://namithap10.github.io/teconerv/ .