ARIMLABS.AIPolish-Japanese Academy of InformationApr 30, 2026arXiv:2604.27534

Entropy of Ukrainian

Anton Lavreniuk, Mykyta Mudryi, Markiian Chaklosh

AI Summary

This paper presents the first empirical estimation of Ukrainian language entropy using a Shannon game approach, where human participants predict the next character in a sequence. Data from 184 volunteers was collected via social media, yielding an upper bound entropy estimate of approximately 1.201 bits per character. The estimated entropy is then compared to the performance of current Large Language Models on the same task.

Key Contribution

Ukrainian is more predictable than you think: its entropy is empirically estimated for the first time, revealing an upper bound of just 1.201 bits per character.

Abstract

In natural language processing, the entropy of a language is a measure of its unpredictability and complexity. The first study on this subject was conducted by Claude Shannon in 1951. By having participants predict the next character in a sentence, he was able to approximate the entropy of the English language. Several follow-up studies by other authors have since been conducted for English, and one for Hebrew. However, to date, Shannon's experiment has never been conducted for Ukrainian. In this paper, we perform this experiment for Ukrainian by recruiting 184 volunteers using social media channels. We rely on techniques used for English to approximate the entropy value of Ukrainian. The final result is an upper bound of $H_{upper}\approx1.201$ bits per character. We compare this to the performance of current Large Language Models. The methods and code used are also documented and published, along with a discussion of the main challenges encountered.

Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Entropy of Ukrainian

Related Papers