Search papers, labs, and topics across Lattice.
FoleyDesigner is introduced, a novel framework for generating spatio-temporally aligned stereo Foley sounds for film clips, inspired by professional Foley workflows. It uses a multi-agent architecture for video analysis and latent diffusion models trained on spatio-temporal cues, guided by LLMs to mimic post-production practices. The authors also introduce FilmStereo, a new professional stereo audio dataset with spatial metadata, timestamps, and semantic annotations, demonstrating superior spatio-temporal alignment and compatibility with professional film production standards.
Finally, AI can automate the tedious process of Foley sound design, generating perfectly synced stereo audio that even meets professional film production standards.
Foley art plays a pivotal role in enhancing immersive auditory experiences in film, yet manual creation of spatio-temporally aligned audio remains labor-intensive. We propose FoleyDesigner, a novel framework inspired by professional Foley workflows, integrating film clip analysis, spatio-temporally controllable Foley generation, and professional audio mixing capabilities. FoleyDesigner employs a multi-agent architecture for precise spatio-temporal analysis. It achieves spatio-temporal alignment through latent diffusion models trained on spatio-temporal cues extracted from video frames, combined with large language model (LLM)-driven hybrid mechanisms that emulate post-production practices in film industry. To address the lack of high-quality stereo audio datasets in film, we introduce FilmStereo, the first professional stereo audio dataset containing spatial metadata, precise timestamps, and semantic annotations for eight common Foley categories. For applications, the framework supports interactive user control while maintaining seamless integration with professional pipelines, including 5.1-channel Dolby Atmos systems compliant with ITU-R BS.775 standards, thereby offering extensive creative flexibility. Extensive experiments demonstrate that our method achieves superior spatio-temporal alignment compared to existing baselines, with seamless compatibility with professional film production standards. The project page is available at https://gekiii996.github.io/FoleyDesigner/ .