Mar 11, 2026arXiv:2603.11295

Temporal Text Classification with Large Language Models

AI Summary

This paper benchmarks the performance of several proprietary (Claude 3.5, GPT-4o, Gemini 1.5) and open-source (LLaMA 3.2, Gemma 2, Mistral, Nemotron 4) large language models on the task of temporal text classification (TTC), which involves estimating the publication date of texts. The study evaluates zero-shot, few-shot prompting, and fine-tuning approaches. Results show that proprietary models excel, particularly with few-shot prompting, while fine-tuning significantly enhances open-source models, although they still underperform compared to proprietary alternatives.

Key Contribution

Despite their general prowess, open-source LLMs still lag behind proprietary models in the nuanced task of dating texts, even after fine-tuning.

Abstract

Languages change over time. Computational models can be trained to recognize such changes enabling them to estimate the publication date of texts. Despite recent advancements in Large Language Models (LLMs), their performance on automatic dating of texts, also known as Temporal Text Classification (TTC), has not been explored. This study provides the first systematic evaluation of leading proprietary (Claude 3.5, GPT-4o, Gemini 1.5) and open-source (LLaMA 3.2, Gemma 2, Mistral, Nemotron 4) LLMs on TTC using three historical corpora, two in English and one in Portuguese. We test zero-shot and few-shot prompting, and fine-tuning settings. Our results indicate that proprietary models perform well, especially with few-shot prompting. They also indicate that fine-tuning substantially improves open-source models but that they still fail to match the performance delivered by proprietary LLMs.

Eval Frameworks & Benchmarks Natural Language Processing Open-Source Models & Weights

Citation Metrics

Citations0

Influential citations0

References25

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Temporal Text Classification with Large Language Models

Related Papers