Search papers, labs, and topics across Lattice.
The paper introduces ChunQiuTR, a new benchmark for time-keyed retrieval in Classical Chinese annals, designed to evaluate temporal consistency in retrieval-augmented generation. It addresses the challenge of retrieving records based on terse, implicit, non-Gregorian time expressions. The authors propose CTD (Calendrical Temporal Dual-encoder), a time-aware dual-encoder that incorporates Fourier-based calendrical context and relative offset biasing, demonstrating improved performance over semantic baselines in time-keyed retrieval.
Grounding RAG in historical texts demands more than semantic similarity; ChunQiuTR reveals the critical importance of temporal consistency, where even plausible evidence can be rendered invalid by subtle time-key mismatches.
Retrieval shapes how language models access and ground knowledge in retrieval-augmented generation (RAG). In historical research, the target is often not an arbitrary relevant passage, but the exact record for a specific regnal month, where temporal consistency matters as much as topical relevance. This is especially challenging for Classical Chinese annals, where time is expressed through terse, implicit, non-Gregorian reign phrases that must be interpreted from surrounding context, so semantically plausible evidence can still be temporally invalid. We introduce \textbf{ChunQiuTR}, a time-keyed retrieval benchmark built from the \textit{Spring and Autumn Annals} and its exegetical tradition. ChunQiuTR organizes records by month-level reign keys and includes chrono-near confounders that mirror realistic retrieval failures. We further propose \textbf{CTD} (Calendrical Temporal Dual-encoder), a time-aware dual-encoder that combines Fourier-based absolute calendrical context with relative offset biasing. Experiments show consistent gains over strong semantic dual-encoder baselines under time-keyed evaluation, supporting retrieval-time temporal consistency as a key prerequisite for faithful downstream historical RAG. Our code and datasets are available at \href{https://github.com/xbdxwyh/ChunQiuTR}{\texttt{github.com/xbdxwyh/ChunQiuTR}}.