FudanIndependent ResearcherRUYi Dynamics CoApr 30, 2026arXiv:2604.27476

EdgeFM: Efficient Edge Inference for Vision-Language Models

Mengling Deng, Yuanpeng Chen, Sheng Yang, Wei Tao, Wenhai Zhang, Hui Song, Linyuanhao Qin, Kai Zhao, Xiaojun Ye, Shanhui Mo, Jingli Fan, Shuang Zhang, Bei Liu, Tiankun Zhao, Xiangjing An

AI Summary

EdgeFM is a lightweight, open-source inference framework designed to optimize vision-language model (VLM) deployment on resource-constrained edge devices. It leverages AI agents to tune low-level kernels and create a modular library of reusable, optimized skills, bypassing the limitations of vendor-specific toolchains. Experiments demonstrate up to 1.49x speedup over TensorRT-Edge-LLM on NVIDIA Orin and successful deployment on Horizon Journey, showcasing improved cross-platform portability and performance.

Key Contribution

Escape vendor lock-in and unlock faster VLM inference on edge devices with EdgeFM, an open-source framework that beats proprietary toolchains by up to 49%.

Abstract

Vision-language models (VLMs) have demonstrated strong applicability in edge industrial applications, yet their deployment remains severely constrained by requirements for deterministic low latency and stable execution under resource limitations. Existing frameworks either rely on bloated general-purpose designs or force developers into opaque, hardware-specific closed-source ecosystems, leading to hardware lock-in limitation and poor cross-platform adaptability. Observing that modern AI agents can efficiently search and tune configurations to generate highly optimized low-level kernels for standard LLM operators, we propose EdgeFM, a lightweight, agent-driven VLM/LLM inference framework tailored for cross-platform industrial edge deployment. EdgeFM removes non-essential features to reduce single-request latency, and encapsulates agent-tuned kernel optimizations as a modular library of reusable skills. By allowing direct invocation of these skills rather than waiting for closed-source implementations, it effectively closes the performance gap long dominated by proprietary toolchains. The framework natively supports mainstream platforms including x86 and NVIDIA Orin SoCs, and represents the first end-to-end VLA deployment on the domestic Horizon Journey platform, enhancing cross-platform portability. In most cases, it yields clearly better inference performance than conventional vendor-specific toolchains, achieving up to 1.49 times speedup over TensorRT-Edge-LLM on the NVIDIA Orin platform. Experimental results show that EdgeFM delivers favorable end-to-end inference performance, providing an open-source, production-grade solution for diverse edge industrial scenarios.

Computer Vision Inference & Quantization Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

EdgeFM: Efficient Edge Inference for Vision-Language Models

Related Papers