Stanford HAIMacao Polytechnic UniversityNankai UniversitySanta Clara UniversityApr 15, 2026arXiv:2604.13667

From Pixels to Nucleotides: End-to-End Token-Based Video Compression for DNA Storage

Cihan Ruan, Lebin Zhou, Bingqing Zhao, Rongduo Han, Qiming Yuan, Chenchen Zhu, Linyi Han, Liang Yang, Wei Wang, Wei Jiang, Nam Ling

AI Summary

The paper introduces HELIX, the first end-to-end neural network for jointly optimizing video compression and DNA encoding, addressing the challenge of storing video in DNA. HELIX leverages token-based representations that naturally align with DNA's quaternary alphabet, mapping discrete semantic units directly to ATCG bases. The proposed TK-SCONE (Token-Kronecker Structured Constraint-Optimized Neural Encoding) achieves 1.91 bits per nucleotide by using Kronecker-structured mixing and FSM-based mapping, outperforming two-stage approaches by simultaneously optimizing for visual quality, masked prediction, and DNA synthesis efficiency.

Key Contribution

Neural video codecs can be designed for biological substrates from the ground up, unlocking a new paradigm for DNA storage.

Abstract

DNA-based storage has emerged as a promising approach to the global data crisis, offering molecular-scale density and millennial-scale stability at low maintenance cost. Over the past decade, substantial progress has been made in storing text, images, and files in DNA -- yet video remains an open challenge. The difficulty is not merely technical: effective video DNA storage requires co-designing compression and molecular encoding from the ground up, a challenge that sits at the intersection of two fields that have largely evolved independently. In this work, we present HELIX, the first end-to-end neural network jointly optimizing video compression and DNA encoding -- prior approaches treat the two stages independently, leaving biochemical constraints and compression objectives fundamentally misaligned. Our key insight: token-based representations naturally align with DNA's quaternary alphabet -- discrete semantic units map directly to ATCG bases. We introduce TK-SCONE (Token-Kronecker Structured Constraint-Optimized Neural Encoding), which achieves 1.91 bits per nucleotide through Kronecker-structured mixing that breaks spatial correlations and FSM-based mapping that guarantees biochemical constraints. Unlike two-stage approaches, HELIX learns token distributions simultaneously optimized for visual quality, prediction under masking, and DNA synthesis efficiency. This work demonstrates for the first time that learned compression and molecular storage converge naturally at token representations -- suggesting a new paradigm where neural video codecs are designed for biological substrates from the ground up.

Computer Vision Inference & Quantization Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

From Pixels to Nucleotides: End-to-End Token-Based Video Compression for DNA Storage

Related Papers