Mar 30, 2026arXiv:2603.28888

A Semantic Observer Layer for Autonomous Vehicles: Pre-Deployment Feasibility Study of VLMs for Low-Latency Anomaly Detection

Kunal Runwal, Kunal Runwal, Swaraj Gajare, Swaraj Gajare, D. Adejumo, Daniel Adejumo, Omkar Ankalkope, Omkar Ankalkope, Siddhant Baroth, Siddhant Baroth, Aliasghar Arab, Aliasghar Arab

AI Summary

This paper explores the feasibility of using Vision-Language Models (VLMs) as a "semantic observer layer" for autonomous vehicles, designed to detect context-dependent hazards missed by pixel-level detectors. They achieve a ~50x speedup by quantizing Nvidia Cosmos-Reason1-7B to NVFP4 and using FlashAttention2, reaching ~500ms inference time. The study identifies NF4 quantization recall collapse as a key deployment challenge and maps performance metrics to safety goals, demonstrating pre-deployment feasibility.

Key Contribution

A 50x speedup makes VLMs fast enough to serve as a real-time semantic safety net for self-driving cars, but NF4 quantization can cause critical recall failures.

Abstract

Semantic anomalies-context-dependent hazards that pixel-level detectors cannot reason about-pose a critical safety risk in autonomous driving. We propose a \emph{semantic observer layer}: a quantized vision-language model (VLM) running at 1--2\,Hz alongside the primary AV control loop, monitoring for semantic edge cases, and triggering fail-safe handoffs when detected. Using Nvidia Cosmos-Reason1-7B with NVFP4 quantization and FlashAttention2, we achieve ~500 ms inference a ~50x speedup over the unoptimized FP16 baseline (no quantization, standard PyTorch attention) on the same hardware--satisfying the observer timing budget. We benchmark accuracy, latency, and quantization behavior in static and video conditions, identify NF4 recall collapse (10.6%) as a hard deployment constraint, and a hazard analysis mapping performance metrics to safety goals. The results establish a pre-deployment feasibility case for the semantic observer architecture on embodied-AI AV platforms.

Inference & Quantization Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References22

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

A Semantic Observer Layer for Autonomous Vehicles: Pre-Deployment Feasibility Study of VLMs for Low-Latency Anomaly Detection

Related Papers