Apr 21, 2026arXiv:2604.19324

PLaMo 2.1-VL Technical Report

Tommi Kerola, Yuya Masuda, Takashi Masuko, Toshiki Nakanishi, Daisuke Nishino, Kuniyuki Takahashi, Hanqin Wang, Yoshihiro Yamada

AI Summary

PLaMo 2.1-VL is a new series of lightweight (8B and 2B parameter) Vision Language Models specifically designed for edge deployment and Japanese language operation. The models are trained using a large-scale synthetic data generation pipeline and evaluated on Visual Question Answering (VQA) and Visual Grounding tasks, with a focus on real-world applications like factory task analysis and infrastructure anomaly detection. PLaMo 2.1-VL achieves state-of-the-art performance among comparable open models on Japanese and English benchmarks, and demonstrates strong zero-shot and fine-tuned performance on the target application scenarios.

Key Contribution

Edge-deployable VLMs can now achieve surprisingly strong performance in Japanese language and real-world vision tasks, rivaling larger models.

Abstract

We introduce PLaMo 2.1-VL, a lightweight Vision Language Model (VLM) for autonomous devices, available in 8B and 2B variants and designed for local and edge deployment with Japanese-language operation. Focusing on Visual Question Answering (VQA) and Visual Grounding as its core capabilities, we develop and evaluate the models for two real-world application scenarios: factory task analysis via tool recognition, and infrastructure anomaly detection. We also develop a large-scale synthetic data generation pipeline and comprehensive Japanese training and evaluation resources. PLaMo 2.1-VL outperforms comparable open models on Japanese and English benchmarks, achieving 61.5 ROUGE-L on JA-VG-VQA-500 and 85.2% accuracy on Japanese Ref-L4. For the two application scenarios, it achieves 53.9% zero-shot accuracy on factory task analysis, and fine-tuning on power plant data improves anomaly detection bbox + label F1-score from 39.7 to 64.9.

Computer Vision Data Curation & Synthetic Data Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

PLaMo 2.1-VL Technical Report

Related Papers