Search papers, labs, and topics across Lattice.
This survey paper provides a structured overview of Vision-and-Language Navigation for UAVs (UAV-VLN), tracing its evolution from modular and deep learning approaches to agentic systems powered by VLMs and generative world models. It reviews essential resources like simulators, datasets, and metrics, while also critically analyzing challenges such as the sim-to-real gap and linguistic ambiguity. The paper concludes by proposing a research roadmap focused on multi-agent swarm coordination and air-ground collaborative robotics.
UAV-VLN is still far from real-world deployment due to challenges like the sim-to-real gap and linguistic ambiguity, highlighting opportunities for research in areas like multi-agent coordination.
Vision-and-Language Navigation for Unmanned Aerial Vehicles (UAV-VLN) represents a pivotal challenge in embodied artificial intelligence, focused on enabling UAVs to interpret high-level human commands and execute long-horizon tasks in complex 3D environments. This paper provides a comprehensive and structured survey of the field, from its formal task definition to the current state of the art. We establish a methodological taxonomy that charts the technological evolution from early modular and deep learning approaches to contemporary agentic systems driven by large foundation models, including Vision-Language Models (VLMs), Vision-Language-Action (VLA) models, and the emerging integration of generative world models with VLA architectures for physically-grounded reasoning. The survey systematically reviews the ecosystem of essential resources simulators, datasets, and evaluation metrics that facilitates standardized research. Furthermore, we conduct a critical analysis of the primary challenges impeding real-world deployment: the simulation-to-reality gap, robust perception in dynamic outdoor settings, reasoning with linguistic ambiguity, and the efficient deployment of large models on resource-constrained hardware. By synthesizing current benchmarks and limitations, this survey concludes by proposing a forward-looking research roadmap to guide future inquiry into key frontiers such as multi-agent swarm coordination and air-ground collaborative robotics.