Search papers, labs, and topics across Lattice.
DiningBench, a new hierarchical multi-view benchmark, is introduced to evaluate VLMs on fine-grained classification, nutrition estimation, and visual question answering in the dietary domain. The dataset contains 3,021 distinct dishes with multiple images per dish and verified nutritional data, addressing limitations of existing food-related benchmarks. Evaluation of 29 VLMs reveals significant challenges in fine-grained visual discrimination and precise nutritional reasoning, even with multi-view inputs and chain-of-thought prompting.
Current VLMs, despite excelling at general reasoning, still fail to accurately identify food and estimate nutrition, even when given multiple views and chain-of-thought prompting.
Recent advancements in Vision-Language Models (VLMs) have revolutionized general visual understanding. However, their application in the food domain remains constrained by benchmarks that rely on coarse-grained categories, single-view imagery, and inaccurate metadata. To bridge this gap, we introduce DiningBench, a hierarchical, multi-view benchmark designed to evaluate VLMs across three levels of cognitive complexity: Fine-Grained Classification, Nutrition Estimation, and Visual Question Answering. Unlike previous datasets, DiningBench comprises 3,021 distinct dishes with an average of 5.27 images per entry, incorporating fine-grained"hard"negatives from identical menus and rigorous, verification-based nutritional data. We conduct an extensive evaluation of 29 state-of-the-art open-source and proprietary models. Our experiments reveal that while current VLMs excel at general reasoning, they struggle significantly with fine-grained visual discrimination and precise nutritional reasoning. Furthermore, we systematically investigate the impact of multi-view inputs and Chain-of-Thought reasoning, identifying five primary failure modes. DiningBench serves as a challenging testbed to drive the next generation of food-centric VLM research. All codes are released in https://github.com/meituan/DiningBench.