Search papers, labs, and topics across Lattice.
The paper introduces LegalMidm, a Korean legal-domain LLM, trained using a novel framework emphasizing real-world legal use cases. They construct high-quality datasets through collaboration with legal professionals and rigorous data curation, focusing on relevance and factual accuracy. Experiments demonstrate LegalMidm's effectiveness in key legal tasks, suggesting the importance of use-case-driven training for domain specialization.
Forget generic legal LLMs – LegalMidm shows that focusing on specific Korean legal use cases, with data curated by legal pros, unlocks real-world performance gains.
In recent years, the rapid proliferation of open-source large language models (LLMs) has spurred efforts to turn general-purpose models into domain specialists. However, many domain-specialized LLMs are developed using datasets and training protocols that are not aligned with the nuanced requirements of real-world applications. In the legal domain, where precision and reliability are essential, this lack of consideration limits practical utility. In this study, we propose a systematic training framework grounded in the practical needs of the legal domain, with a focus on Korean law. We introduce LegalMidm, a Korean legal-domain LLM, and present a methodology for constructing high-quality, use-case-driven legal datasets and optimized training pipelines. Our approach emphasizes collaboration with legal professionals and rigorous data curation to ensure relevance and factual accuracy, and demonstrates effectiveness in key legal tasks.