Centro Ricerche Enrico Fermi (CREF)Complexity Science Hub (CSH)KonstanzUniversitat de BarcelonaMay 6, 2026arXiv:2605.04875

Anticipating Innovation Using Large Language Models

Enrico Maria Fenoaltea, Filippo Santoro, Giordano De Marzo, Segun Taofeek Aroyehun, Andrea Tacchella

AI Summary

This paper investigates the predictability of future technological innovation by analyzing temporal shifts in patent language. They introduce TechToken, a transformer model fine-tuned on patent text with International Patent Classification (IPC) codes as vocabulary tokens, to capture the evolving relationships between technologies. The study demonstrates that the convergence of IPC code embeddings, reflecting linguistic similarity, can predict future technological combinations decades in advance and improves performance on patent-related tasks.

Key Contribution

Forget expert intuition – language trends in patent filings can foresee technological breakthroughs years before they happen.

Abstract

Forecasting innovation, intended as the emergence of new technological combinations, is a fundamental challenge for science and policy. We show that forthcoming combinations leave an early trace in the collective language of patents, with predictive signals detectable even decades in advance. We show that signal is not attributable to any single inventor, but emerges as a collective shift in how technologies are described across thousands of patents. To this end, we introduce TechToken, a transformer-based model that treats technologies, classified by International Patent Classification codes, as words in its vocabulary, learning the language of technologies by embedding these codes during fine-tuning. We define context similarity between code embeddings as a measure of linguistic convergence and show that it accurately predicts first technological combinations. TechToken also improves general representation quality, outperforming state-of-the-art models across different patent-related tasks.

Natural Language Processing Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Anticipating Innovation Using Large Language Models

Related Papers