Search papers, labs, and topics across Lattice.
This paper investigates domain adaptation strategies for dense retrieval in the heterogeneous Brazilian legal domain, considering case law, legislation, and question-based search. They compare a base Qwen3-Embedding-4B model with versions fine-tuned on legal data only and a mix of legal and SQuAD-pt data. Results on six datasets show that legal-only training excels in specialized legal tasks, while mixed training achieves a better balance, particularly improving performance on question-based retrieval and overall average NDCG@10, MRR@10, and MAP@10.
Fine-tuning dense retrievers on a mix of domain-specific and general question-answering data achieves surprisingly robust performance across diverse legal search tasks, outperforming models trained solely on legal data.
Brazilian legal retrieval is heterogeneous, covering case law, legislation, and question-based search. This makes training dense retrievers a trade-off between stronger domain specialization and broader robustness across retrieval types of search. In this paper, we explore this trade-off using three training setups based on Qwen3-Embedding-4B: a base model with no fine-tuning, a version trained only on legal data, and a mixed setup that combines legal data with SQuAD-pt supervised dataset. We evaluate these models on five legal datasets from the JU\'A leaderboard, along with Quati dataset as an extra Portuguese retrieval benchmark to test out-of-domain generalization. The legal-only model performs best on the most specialized legal tasks. The mixed setup keeps strong performance on legal data while offering a better overall balance, improving average NDCG@10 from 0.414 to 0.447, MRR@10 from 0.586 to 0.595, and MAP@10 from 0.270 to 0.308 across all six datasets. The biggest improvement appears on Quati, where the mixed model clearly outperforms the legal-only one. Overall, the results show that legal-only and mixed training lead to different strengths: the first is better for specialization, while the second is more robust across different types of search, especially question-based ones. Both adapted models are available on Hugging Face