Mar 2, 2026arXiv:2603.01910

FLANS at SemEval-2026 Task 7: RAG with Open-Sourced Smaller LLMs for Everyday Knowledge Across Diverse Languages and Cultures

Liliia Bogdanova, L. Bogdanova, Shiran Sun, Lifeng Han, Lifeng Han, Natalia Amat Lefort, Natalia Amat Lefort, Flor Miriam Plaza-del-Arco, Flor Miriam Plaza-del-Arco

AI Summary

The paper presents a retrieval-augmented generation (RAG) system using open-source smaller LLMs (OS-sLLMs) for the SemEval-2026 Task 7 on everyday knowledge across diverse languages and cultures. The authors created culturally-aware knowledge bases (CulKBs) by extracting Wikipedia content using keyword lists and country-specific summaries to enhance the RAG approach. The system, evaluated on English, Spanish, and Chinese for both short answer and multiple-choice questions, also integrates live online search via DuckDuckGo.

Key Contribution

Achieve cross-lingual and cross-cultural knowledge retrieval using RAG with open-source small LLMs, demonstrating a path towards privacy-conscious and sustainable knowledge-intensive NLP.

Abstract

This system paper describes our participation in the SemEval-2025 Task-7 ``Everyday Knowledge Across Diverse Languages and Cultures''. We attended two subtasks, i.e., Track 1: Short Answer Questions (SAQ), and Track 2: Multiple-Choice Questions (MCQ). The methods we used are retrieval augmented generation (RAGs) with open-sourced smaller LLMs (OS-sLLMs). To better adapt to this shared task, we created our own culturally aware knowledge base (CulKBs) by extracting Wikipedia content using keyword lists we prepared. We extracted both culturally-aware wiki-text and country-specific wiki-summary. In addition to the local CulKBs, we also have one system integrating live online search output via DuckDuckGo. Towards better privacy and sustainability, we aimed to deploy smaller LLMs (sLLMs) that are open-sourced on the Ollama platform. We share the prompts we developed using refinement techniques and report the learning curve of such prompts. The tested languages are English, Spanish, and Chinese for both tracks. Our resources and codes are shared via https://github.com/aaronlifenghan/FLANS-2026

Natural Language Processing Open-Source Models & Weights Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References17

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

FLANS at SemEval-2026 Task 7: RAG with Open-Sourced Smaller LLMs for Everyday Knowledge Across Diverse Languages and Cultures

Related Papers