Search papers, labs, and topics across Lattice.
This paper investigates the predictability of future technological innovation by analyzing temporal shifts in patent language. They introduce TechToken, a transformer model fine-tuned on patent text with International Patent Classification (IPC) codes as vocabulary tokens, to capture the evolving relationships between technologies. The study demonstrates that the convergence of IPC code embeddings, reflecting linguistic similarity, can predict future technological combinations decades in advance and improves performance on patent-related tasks.
Forget expert intuition – language trends in patent filings can foresee technological breakthroughs years before they happen.
Forecasting innovation, intended as the emergence of new technological combinations, is a fundamental challenge for science and policy. We show that forthcoming combinations leave an early trace in the collective language of patents, with predictive signals detectable even decades in advance. We show that signal is not attributable to any single inventor, but emerges as a collective shift in how technologies are described across thousands of patents. To this end, we introduce TechToken, a transformer-based model that treats technologies, classified by International Patent Classification codes, as words in its vocabulary, learning the language of technologies by embedding these codes during fine-tuning. We define context similarity between code embeddings as a measure of linguistic convergence and show that it accurately predicts first technological combinations. TechToken also improves general representation quality, outperforming state-of-the-art models across different patent-related tasks.