Search papers, labs, and topics across Lattice.
The paper introduces Co-Director, a hierarchical multi-agent framework that formulates video storytelling as a global optimization problem to address semantic drift and cascading failures in current agentic pipelines. Co-Director uses a multi-armed bandit to explore creative directions and a multimodal self-refinement loop for sequence-level consistency. Evaluated on a new GenAD-Bench dataset, Co-Director significantly outperforms state-of-the-art baselines in generating coherent video narratives, demonstrating generalization to broader cinematic contexts.
Forget handcrafted prompts: a hierarchical multi-agent framework turns diffusion models into coherent storytelling engines by globally optimizing for semantic coherence.
While diffusion models generate high-fidelity video clips, transforming them into coherent storytelling engines remains challenging. Current agentic pipelines automate this via chained modules but suffer from semantic drift and cascading failures due to independent, handcrafted prompting. We present Co-Director, a hierarchical multi-agent framework formalizing video storytelling as a global optimization problem. To ensure semantic coherence, we introduce hierarchical parameterization: a multi-armed bandit globally identifies promising creative directions, while a local multimodal self-refinement loop mitigates identity drift and ensures sequence-level consistency. This balances the exploration of novel narrative strategies with the exploitation of effective creative configurations. For evaluation, we introduce GenAD-Bench, a 400-scenario dataset of fictional products for personalized advertising. Experiments demonstrate that Co-Director significantly outperforms state-of-the-art baselines, offering a principled approach that seamlessly generalizes to broader cinematic narratives. Project Page: https://co-director-agent.github.io/