Search papers, labs, and topics across Lattice.
This paper introduces a neural machine translation (NMT) system for Hindi-to-Dogri text conversion, addressing the lack of resources for the Dogri language. The system combines rule-based and statistical methods with deep learning to handle syntactic and semantic challenges. A bilingual corpus of 200,000 Hindi-Dogri sentence pairs was created to train the model, and the system's performance was evaluated using BLEU and TER scores, demonstrating improved translation quality.
A new Hindi-to-Dogri translation system shows that corpus-driven deep learning significantly outperforms traditional rule-based methods, paving the way for better NLP tools for low-resource languages.
Automatic translation across many languages is made possible using Machine Translation (MT) which undeniably plays an important role in language translation. This paper presents an automatic machine translation system for converting Hindi text into Dogri text. Given the language barriers and structure of the two languages, the system employs high-level natural language processing and deep learning models to perfect the translation. The system has a mixed architecture of rule-based and statistical methods to resolve various syntactic and semantic issues of the text. In addition, a bilingual corpus is created to better train the model to understand the context in which his sentences are used. The system was evaluated with BLEU and TER score metrics. The test results showed improvement in translation quality which can help preserve the languages and foster better communication at the regional level. The lack of sufficient technical resources for the Dogri language stems from its poorly structured corpus. This research investigates the construction of a parallel bilingual corpus of Dogri and Hindi that will contain 200000 sentence pairs. The aim is to enable the development of NLP tools such as statistical and neural machine translation systems, text summarizers, classifiers, and romanization systems. It is well known that deep neural models are more effective if the amount of data is large, especially for resolving lexical ambiguities. The paper also analyzes the shortcomings of the existing rule-based Hindi to Dogri using the arbitrary section of corpus approach and explains the failure of these systems in resolving the vague words complete within the sentence that contains ambiguous terms. It puts forward the need corpus driven deep learning approaches for more precise translation.