Search papers, labs, and topics across Lattice.
This paper evaluates the ability of LLMs to translate natural language queries into structured metadata filters for retrieving food composition data within a RAG system. The authors used Chroma vector database to evaluate four LLMs on their ability to retrieve relevant information from a food composition database. Results show high accuracy for simple and moderately complex queries, but performance degrades when queries involve non-expressible constraints in the metadata.
LLMs can drastically reduce manual effort for domain experts in accessing complex food and nutrition data via RAG, but still struggle with queries that exceed the representational scope of the metadata.
In this article, we evaluate four Large Language Models (LLMs) and their effectiveness at retrieving data within a specialized Retrieval-Augmented Generation (RAG) system, using a comprehensive food composition database. Our method is focused on the LLMs ability to translate natural language queries into structured metadata filters, enabling efficient retrieval via a Chroma vector database. By achieving high accuracy in this critical retrieval step, we demonstrate that LLMs can serve as an accessible, high-performance tool, drastically reducing the manual effort and technical expertise previously required for domain experts, such as food compilers and nutritionists, to leverage complex food and nutrition data. However, despite the high performance on easy and moderately complex queries, our analysis of difficult questions reveals that reliable retrieval remains challenging when queries involve non-expressible constraints. These findings demonstrate that LLM-driven metadata filtering excels when constraints can be explicitly expressed, but struggles when queries exceed the representational scope of the metadata format.