Search papers, labs, and topics across Lattice.
This paper investigates cross-dialect transfer learning for dependency parsing in Pomak, an endangered Slavic language. They quantify the performance drop when transferring a parser trained on Greek-variety Pomak to the Turkish variety due to phonological and morphosyntactic differences. They then demonstrate that fine-tuning on a newly created 650-sentence Turkish-variety Pomak corpus, combined with cross-variety transfer learning, substantially improves parsing accuracy.
Even a small, targeted dataset can bridge the gap in cross-dialect transfer learning for low-resource languages, significantly boosting dependency parsing accuracy.
This paper presents new resources and baselines for Dependency Parsing in Pomak, an endangered Eastern South Slavic language with substantial dialectal variation and no widely adopted standard. We focus on the variety spoken in Turkey (Uzunk枚pr眉) and ask how well a dependency parser trained on the existing Pomak Universal Dependencies treebank, which was built primarily from the variety that is spoken in Greece, transfers across dialects. We run two experimental phases. First, we train a parser on the Greek-variety UD data and evaluate zero-shot transfer to Turkish-variety Pomak, quantifying the impact of phonological and morphosyntactic differences. Second, we introduce a new manually annotated Turkish-variety Pomak corpus of 650 sentences and show that, despite its small size, targeted fine-tuning substantially improves accuracy; performance is further boosted by cross-variety transfer learning that combines the two dialects.