Search papers, labs, and topics across Lattice.
This paper introduces C-TRAIL, a new multimodal dataset aligning dashcam videos and textual descriptions with traffic responsibility modes and Chinese traffic statutes. They then propose a two-stage framework that first generates textual video descriptions and then uses a legal multi-agent framework to determine responsibility, relevant statutes, and judgment reports. Experiments on C-TRAIL and MM-AU demonstrate the proposed method's superiority over general and legal LLMs and existing agent-based approaches in providing interpretable legal reasoning.
Dashcam videos can now be directly linked to legal responsibility determinations via a novel multimodal dataset and legal reasoning framework, outperforming existing LLMs and agent-based systems.
The widespread adoption of dashcams has made video evidence in traffic accidents increasingly abundant, yet transforming "what happened in the video" into "who is responsible under which legal provisions" still relies heavily on human experts. Existing ego-view traffic accident studies mainly focus on perception and semantic understanding, while LLM-based legal methods are mostly built on textual case descriptions and rarely incorporate video evidence, leaving a clear gap between the two. We first propose C-TRAIL, a multimodal legal dataset that, under the Chinese traffic regulation system, explicitly aligns dashcam videos and textual descriptions with a closed set of responsibility modes and their corresponding Chinese traffic statutes. On this basis, we introduce a two-stage framework: (1) a traffic accident understanding module that generates textual video descriptions; and (2) a legal multi-agent framework that outputs responsibility modes, statute sets, and complete judgment reports. Experimental results on C-TRAIL and MM-AU show that our method outperforms general and legal LLMs, as well as existing agent-based approaches, while providing a transparent and interpretable legal reasoning process.