Jun 9, 2026arXiv:2606.10395

Efficient RWKV-based Representation Learning for 3D Point Clouds

Yun Liu, Xuefeng Yan, Liangliang Nan, Xianzhi Li, Peng Li, Zhe Zhu, Honghua Chen, Mingqiang Wei

AI Summary

This paper introduces the P-RWKV block, an innovative adaptation of the RWKV model designed to effectively capture local geometric structures in 3D point clouds while maintaining linear complexity. By incorporating Local Perception Expansion (LPE) and Spatial Context Enhancement (SCE) components, the model enhances both contextual perception and spatial awareness, addressing the limitations of traditional RWKV when applied to irregular geometries. Experimental results demonstrate that P-RWKV achieves competitive performance in self-supervised representation learning tasks with significantly reduced computational costs and inference latency.

Key Contribution

P-RWKV achieves competitive performance in 3D point cloud representation learning while slashing computational costs and inference latency compared to traditional methods.

Abstract

The recent receptance weighted key value (RWKV) model combines RNN-style recurrence, offering a linear-complexity alternative to Transformers' quadratic self-attention for modeling global dependencies. However, when directly applied to point clouds, RWKV, originally developed for sequential text, struggles to capture local geometric structures and model spatial dependencies effectively. To address this, we propose the \textbf{P-RWKV} block, which bridges the gap between sequence modeling and irregular 3D geometry while preserving the efficiency advantages of RWKV. It consists of a Local Perception Expansion (LPE) component to expand contextual perception along the spatio-temporal sequence and a Spatial Context Enhancement (SCE) component to strengthen spatial awareness. To validate the effectiveness of P-RWKV for point cloud understanding, we construct PointER, a single-modality self-supervised representation learning framework whose encoder is composed of stacked P-RWKV blocks. Furthermore, we extend P-RWKV to a cross-modality setting and integrate the proposed core sub-modules into multiple architectures, demonstrating strong plug-and-play flexibility and architectural generality. Extensive experiments show that the P-RWKV block and its key sub-modules achieve competitive performance across various tasks with lower computational cost and inference latency. Code will be released upon acceptance.

Architecture Design (Transformers, SSMs, MoE)Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Efficient RWKV-based Representation Learning for 3D Point Clouds

Related Papers