Apr 22, 2026arXiv:2604.20666

ORPHEAS: A Cross-Lingual Greek-English Embedding Model for Retrieval-Augmented Generation

Ioannis E. Livieris, Athanasios Koursaris, Alexandra Apostolopoulou, Konstantinos Kanaris Dimitris Tsakalidis, George Domalis

AI Summary

This paper introduces ORPHEAS, a specialized Greek-English embedding model designed for retrieval-augmented generation, addressing the limitations of existing multilingual models that inadequately capture the morphological complexity and domain-specific semantics of Greek. By employing a knowledge graph-based fine-tuning methodology on a diverse multi-domain corpus, ORPHEAS achieves superior performance in both monolingual and cross-lingual retrieval tasks. The results indicate that targeted fine-tuning for morphologically rich languages can enhance cross-lingual retrieval capabilities without sacrificing performance.

Key Contribution

ORPHEAS outperforms state-of-the-art multilingual models, proving that specialized fine-tuning can enhance retrieval capabilities for morphologically complex languages.

Abstract

Effective retrieval-augmented generation across bilingual Greek--English applications requires embedding models capable of capturing both domain-specific semantic relationships and cross-lingual semantic alignment. Existing multilingual embedding models distribute their representational capacity across numerous languages, limiting their optimization for Greek and failing to encode the morphological complexity and domain-specific terminological structures inherent in Greek text. In this work, we propose ORPHEAS, a specialized Greek--English embedding model for bilingual retrieval-augmented generation. ORPHEAS is trained with a high quality dataset generated by a knowledge graph-based fine-tuning methodology which is applied to a diverse multi-domain corpus, which enables language-agnostic semantic representations. The numerical experiments across monolingual and cross-lingual retrieval benchmarks reveal that ORPHEAS outperforms state-of-the-art multilingual embedding models, demonstrating that domain-specialized fine-tuning on morphologically complex languages does not compromise cross-lingual retrieval capability.

Natural Language Processing Open-Source Models & Weights Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ORPHEAS: A Cross-Lingual Greek-English Embedding Model for Retrieval-Augmented Generation

Related Papers