Search papers, labs, and topics across Lattice.
This paper reviews and analyzes various Part-of-Speech (POS) tagging approaches for the low-resource language Marathi, focusing on rule-based, statistical (HMM), hybrid, machine learning, and deep learning methods. The study addresses the challenges posed by Marathi's complex morphology and the limitations of data availability for ML/DL techniques. The review identifies significant and emerging directions for improving Marathi POS taggers based on a comparative analysis of recent studies.
Marathi POS tagging struggles highlight the broader challenges of applying NLP techniques to low-resource languages, where data scarcity limits the effectiveness of ML/DL approaches.
In Natural Language Processing (NLP), Part-of-Speech (POS) tagging is an essential task wherein each word in a sentence is assigned a grammatical category, such as noun, verb, or adjective. It enables applications such as information retrieval, text summarization, and machine translation. POS tagging poses special difficulties for Low Resource Languages like Marathi because of complicated morphology. This is addressed by a variety of techniques, such as rule-based techniques that rely on linguistic rules, and statistical models such as Hidden Markov Models (HMMs) using probabilities dependent on big annotated datasets. Hybrid models are the combination of these two approaches showing enhanced robustness. Despite their potential, machine learning (ML) and deep learning (DL) approaches encounter challenges because of insufficient data. This review compares recent studies and highlights the significant and emerging directions for further improving POS Taggers for Marathi.