Search papers, labs, and topics across Lattice.
The authors introduce OmniHuman, a large-scale dataset of human-centric videos designed to address limitations in existing datasets regarding scene diversity, interaction modeling, and attribute alignment. They also present a fully automated pipeline for data collection and multi-modal annotation to generate this dataset. To evaluate models trained on OmniHuman, they introduce the OmniHuman Benchmark (OHBench), a three-level evaluation system with metrics aligned with human perception.
Existing video datasets fail to capture the complexity of human interactions in diverse scenes, but OmniHuman offers a new benchmark to train and evaluate models on more realistic human-centric video generation.
Recent advancements in audio-video joint generation models have demonstrated impressive capabilities in content creation. However, generating high-fidelity human-centric videos in complex, real-world physical scenes remains a significant challenge. We identify that the root cause lies in the structural deficiencies of existing datasets across three dimensions: limited global scene and camera diversity, sparse interaction modeling (both person-person and person-object), and insufficient individual attribute alignment. To bridge these gaps, we present OmniHuman, a large-scale, multi-scene dataset designed for fine-grained human modeling. OmniHuman provides a hierarchical annotation covering video-level scenes, frame-level interactions, and individual-level attributes. To facilitate this, we develop a fully automated pipeline for high-quality data collection and multi-modal annotation. Complementary to the dataset, we establish the OmniHuman Benchmark (OHBench), a three-level evaluation system that provides a scientific diagnosis for human-centric audio-video synthesis. Crucially, OHBench introduces metrics that are highly consistent with human perception, filling the gaps in existing benchmarks by providing a comprehensive diagnosis across global scenes, relational interactions, and individual attributes.