Search papers, labs, and topics across Lattice.
This paper introduces a VLM-LLM agent framework for Oracle Bone Script (OBS) interpretation that leverages the compositional structure of the script. The framework combines visual grounding of components with LLM-based reasoning to identify components, retrieve knowledge, and infer relationships for accurate interpretation. The authors also introduce OB-Radix, a new expert-annotated dataset of OBS characters and components, and demonstrate improved decipherment performance compared to baselines across three benchmarks.
Cracking ancient languages like Oracle Bone Script just got easier: a new agent-driven Vision-Language Model leverages the compositional structure of characters to automate reasoning and improve interpretation accuracy.
Deciphering ancient Chinese Oracle Bone Script (OBS) is a challenging task that offers insights into the beliefs, systems, and culture of the ancient era. Existing approaches treat decipherment as a closed-set image recognition problem, which fails to bridge the ``interpretation gap'': while individual characters are often unique and rare, they are composed of a limited set of recurring, pictographic components that carry transferable semantic meanings. To leverage this structural logic, we propose an agent-driven Vision-Language Model (VLM) framework that integrates a VLM for precise visual grounding with an LLM-based agent to automate a reasoning chain of component identification, graph-based knowledge retrieval, and relationship inference for linguistically accurate interpretation. To support this, we also introduce OB-Radix, an expert-annotated dataset providing structural and semantic data absent from prior corpora, comprising 1,022 character images (934 unique characters) and 1,853 fine-grained component images across 478 distinct components with verified explanations. By evaluating our system across three benchmarks of different tasks, we demonstrate that our framework yields more detailed and precise decipherments compared to baseline methods.