NEC Labs AmericaApr 6, 2026arXiv:2604.04887

HorizonWeaver: Generalizable Multi-Level Semantic Editing for Driving Scenes

Mauricio Soroco, Francesco Pittaluga, Zaid Tasneem, Abhishek Aich, Bingbing Zhuang, Wuyang Chen, Manmohan Chandraker, Ziyu Jiang

AI Summary

HorizonWeaver addresses the challenge of instruction-guided editing of driving scenes by introducing a framework that combines a large-scale paired real/synthetic dataset, language-guided masks for fine-grained editing, and joint losses for content preservation and instruction alignment. The framework leverages data from Boreas, nuScenes, and Argoverse2 to improve generalization across diverse driving environments. Experiments demonstrate that HorizonWeaver outperforms existing methods in L1, CLIP, and DINO metrics, achieving significant gains in user preference and BEV segmentation IoU.

Key Contribution

Editing driving scenes with language just got a whole lot better: HorizonWeaver lets you scalably generate photorealistic, controllable scenarios, outperforming existing methods by a wide margin.

Abstract

Ensuring safety in autonomous driving requires scalable generation of realistic, controllable driving scenes beyond what real-world testing provides. Yet existing instruction guided image editors, trained on object-centric or artistic data, struggle with dense, safety-critical driving layouts. We propose HorizonWeaver, which tackles three fundamental challenges in driving scene editing: (1) multi-level granularity, requiring coherent object- and scene-level edits in dense environments; (2) rich high-level semantics, preserving diverse objects while following detailed instructions; and (3) ubiquitous domain shifts, handling changes in climate, layout, and traffic across unseen environments. The core of HorizonWeaver is a set of complementary contributions across data, model, and training: (1) Data: Large-scale dataset generation, where we build a paired real/synthetic dataset from Boreas, nuScenes, and Argoverse2 to improve generalization; (2) Model: Language-Guided Masks for fine-grained editing, where semantics-enriched masks and prompts enable precise, language-guided edits; and (3) Training: Content preservation and instruction alignment, where joint losses enforce scene consistency and instruction fidelity. Together, HorizonWeaver provides a scalable framework for photorealistic, instruction-driven editing of complex driving scenes, collecting 255K images across 13 editing categories and outperforming prior methods in L1, CLIP, and DINO metrics, achieving +46.4% user preference and improving BEV segmentation IoU by +33%. Project page: https://msoroco.github.io/horizonweaver/

Computer Vision Data Curation & Synthetic Data Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

HorizonWeaver: Generalizable Multi-Level Semantic Editing for Driving Scenes

Related Papers