Mar 18, 2026arXiv:2603.17311

Ruyi2.5 Technical Report

Huan Song, Shuyu Tian, Qingfei Zhao, Wenhao Hong, Jiang Liu, Ting Long, Jiawei Shao, Xuelong Li

AI Summary

Ruyi2.5 is a multimodal familial model built using the AI Flow framework, extending the "Train Once, Deploy Many" paradigm to co-train models of varying scales with a shared backbone. A privacy-preserving camera service system, Ruyi2.5-Camera, is developed using a two-stage recognition pipeline with edge-based de-identification and cloud-based reasoning. To accelerate RL fine-tuning, Binary Prefix Policy Optimization (BPPO) is introduced, achieving a 2-3x speedup over GRPO by reducing sample redundancy and focusing gradient updates.

Key Contribution

Ruyi2.5 achieves comparable performance to Qwen3-VL on general multimodal benchmarks while significantly outperforming it in privacy-constrained surveillance, demonstrating the effectiveness of its edge-cloud architecture.

Abstract

We present Ruyi2.5, a multimodal familial model built on the AI Flow framework. Extending Ruyi2's"Train Once, Deploy Many"paradigm to the multimodal domain, Ruyi2.5 constructs a shared-backbone architecture that co-trains models of varying scales within a single unified pipeline, ensuring semantic consistency across all deployment tiers. Built upon Ruyi2.5, Ruyi2.5-Camera model is developed as a privacy-preserving camera service system, which instantiates Ruyi2.5-Camera into a two-stage recognition pipeline: an edge model applies information-bottleneck-guided irreversible feature mapping to de-identify raw frames at the source, while a cloud model performs deep behavior reasoning. To accelerate reinforcement learning fine-tuning, we further propose Binary Prefix Policy Optimization (BPPO), which reduces sample redundancy via binary response selection and focuses gradient updates on response prefixes, achieving a 2 to 3 times training speedup over GRPO. Experiments show Ruyi2.5 matches Qwen3-VL on the general multimodal benchmarks, while Ruyi2.5-Camera substantially outperforms Qwen3-VL on privacy-constrained surveillance tasks.

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Open-Source Models & Weights

Citation Metrics

Citations0

Influential citations0

References23

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Ruyi2.5 Technical Report

Related Papers