Search papers, labs, and topics across Lattice.
The paper introduces LuxMT, a machine translation system fine-tuned from Gemma 3 27B for Luxembourgish to French and English translation. They construct a new benchmark dataset from a tourist magazine and use a parallel corpus of news articles and parliamentary transcripts for training, filtering the data with sentence embeddings to improve quality. Results show LuxMT significantly outperforms the Gemma 3 baseline, even demonstrating zero-shot translation capabilities to German, and they explore the potential of sentence embeddings for quality estimation.
Fine-tuning a 27B parameter model on a relatively small, low-resource language like Luxembourgish yields surprisingly strong translation performance, even generalizing to unseen languages.
We introduce LuxMT, a machine translation system based on Gemma 3 27B and fine-tuned for translation from Luxembourgish (LB) into French (FR) and English (EN). To assess translation performance, we construct a novel benchmark covering LB-FR, LB-EN, and LB-FR using human-translated data from Luci, a tourist magazine about Luxembourg. Training data stems from LuxAlign, a parallel corpus of multilingual Luxembourgish news articles, and LB parliamentary transcripts augmented with Google Translate. We filter the data using LuxEmbedder, LB sentence embeddings, to remove low-equivalence segment-pairs. Overall, LuxMT's results suggest strong improvements over the Gemma 3 baseline, even for translating LB to German (DE), despite the training data not containing any DE. We also explore LuxEmbedder's potential to be used as a quality estimation metric and find strong correlations with other reference-based metrics. However, we call for further research to fully assess the metric's utility and advise using it with caution.