Search papers, labs, and topics across Lattice.
The authors introduce MultiTempBench, a multilingual temporal reasoning benchmark with 15,000 examples across five languages and multiple calendar conventions, and use it to evaluate 20 LLMs. They find that tokenization quality of temporal artifacts is a key bottleneck, especially in low-resource languages and rarer calendar formats where fragmentation disrupts Year/Month/Day separation. Beyond tokenization, temporal linearity in internal representations is the strongest predictor of temporal reasoning in high-resource languages, while fragmentation is the stronger predictor in low-resource languages.
LLMs' temporal reasoning crumbles in low-resource languages and rarer calendar formats, not due to a lack of reasoning ability, but because poor tokenization fragments dates and times.
We present MultiTempBench, a multilingual temporal reasoning benchmark spanning three tasks, date arithmetic, time zone conversion, and temporal relation extraction across five languages (English, German, Chinese, Arabic, and Hausa) and multiple calendar conventions (Gregorian, Hijri, and Chinese Lunar). MultiTempBench contains $15,000$ examples built by translating $750$ curated English questions and expanding each into controlled date-format variants. We evaluate 20 LLMs and introduce the multilingual Date Fragmentation Ratio (mDFR), calibrated with human severity ratings, together with geometric-probing analyses of internal temporal representations. We find tokenisation quality of temporal artefacts is a resource-dependent bottleneck: in low-resource languages and rarer calendar formats, fragmentation disrupts Year/Month/Day separation and accuracy collapses, while high-resource settings are often robust to digit-level splitting. Beyond tokenisation, crossed mixed-effects regression shows that temporal linearity is the strongest predictor of temporal reasoning in high-resource languages, whereas fragmentation is the stronger predictor in low-resource languages. Code is available at: https://github.com/gagan3012/mtb