Feb 22, 2026arXiv:2602.19174

TurkicNLP: An NLP Toolkit for Turkic Languages

AI Summary

The paper introduces TurkicNLP, an open-source Python library designed to provide a unified NLP pipeline for Turkic languages, addressing the current fragmentation in tooling and resources. It offers functionalities including tokenization, morphological analysis, POS tagging, dependency parsing, NER, transliteration, sentence embeddings, and machine translation through a language-agnostic API. The library's modular architecture integrates rule-based and neural models, supports multiple script families, and ensures interoperability through CoNLL-U standard outputs.

Key Contribution

Finally, a single Python library unlocks a consistent NLP pipeline for the diverse Turkic language family, spoken by over 200 million people and written in four different scripts.

Abstract

Natural language processing for the Turkic language family, spoken by over 200 million people across Eurasia, remains fragmented, with most languages lacking unified tooling and resources. We present TurkicNLP, an open-source Python library providing a single, consistent NLP pipeline for Turkic languages across four script families: Latin, Cyrillic, Perso-Arabic, and Old Turkic Runic. The library covers tokenization, morphological analysis, part-of-speech tagging, dependency parsing, named entity recognition, bidirectional script transliteration, cross-lingual sentence embeddings, and machine translation through one language-agnostic API. A modular multi-backend architecture integrates rule-based finite-state transducers and neural models transparently, with automatic script detection and routing between script variants. Outputs follow the CoNLL-U standard for full interoperability and extension. Code and documentation are hosted at https://github.com/turkic-nlp/turkicnlp .

Natural Language Processing Open-Source Models & Weights

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

TurkicNLP: An NLP Toolkit for Turkic Languages

Related Papers