Search papers, labs, and topics across Lattice.
This paper introduces a Pipeline-based Traffic Analysis System (PTAS) that leverages multimodal perception, including computer vision, multi-object tracking, NLP, RAG, and LLMs, to provide real-time statistics on pedestrian and vehicle flow at intersections. The system enhances object detection using a modified DuCRG-YOLOv11n model with a novel βsilu activation function, achieving 91.4% precision at 68.25 FPS, and integrates ByteTrack for vehicle tracking. The PTAS uses Vertex AI's RAG combined with Claude Sonnet 4 to interpret traffic patterns and compensate for data gaps, demonstrating improved traffic analysis and decision support.
A multimodal AI system can now provide real-time, risk-aware traffic insights by fusing computer vision, multi-object tracking, and LLMs to analyze pedestrian and vehicle flow at intersections.
Traditional automated monitoring systems adopted for Intersection Traffic Control still face challenges, including high costs, maintenance difficulties, insufficient coverage, poor multimodal data integration, and limited traffic information analysis. To address these issues, the study proposes a sovereign AI-driven Smart Transportation governance approach, developing a mobile AI solution equipped with multimodal perception, task decomposition, memory, reasoning, and multi-agent collaboration capabilities. The proposed system integrates computer vision, multi-object tracking, natural language processing, Retrieval-Augmented Generation (RAG), and Large Language Models (LLMs) to construct a Pipeline-based Traffic Analysis System (PTAS). The PTAS can produce real-time statistics on pedestrian and vehicle flows at intersections, incorporating potential risk factors such as traffic accidents, construction activities, and weather conditions for multimodal data fusion analysis, thereby providing forward-looking traffic insights. Experimental results demonstrate that the enhanced DuCRG-YOLOv11n pre-trained model, equipped with our proposed new activation function βsilu, can accurately identify various vehicle types in object detection, achieving a frame rate of 68.25 FPS and a precision of 91.4%. Combined with ByteTrack, it can track over 90% of vehicles in medium- to low-density traffic scenarios, obtaining a 0.719 in MOTA and a 0.08735 in MOTP. In traffic flow analysis, the RAG of Vertex AI, combined with Claude Sonnet 4 LLMs, provides a more comprehensive view, precisely interpreting the causes of peak-hour congestion and effectively compensating for missing data through contextual explanations. The proposed method can enhance the efficiency of urban traffic regulation and optimizes decision support in intelligent transportation systems.