Mar 2, 2026arXiv:2603.01766

Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models

Hao Liu, Haoyun Liu, Jian Zhao, Jianzhuang Zhao, Xinyuan Chang, Xinyuan Chang, Tianle Shi, Tianle Shi, Chuanzhang Meng, Chuanzhang Meng, Jiayuan Tan, J. Tan, Feng Xiong, Feng Xiong, Tong Lin, Dongjie Huo, Dongjie Huo, Mu Xu, Mu Xu, Songlin Dong, SongLin Dong, Zhiheng Ma, Zhiheng Ma, Yihong Gong, Yihong Gong, Sheng Zhong

AI Summary

The paper introduces Neural Implicit Action Fields (NIAF), a novel approach for Vision-Language-Action (VLA) models that replaces discrete waypoint prediction with continuous action function regression. NIAF uses a Multimodal Large Language Model (MLLM) as a hierarchical spectral modulator over a learned motion prior to synthesize infinite-resolution trajectories. This continuous representation enables analytical differentiability, allowing for explicit supervision of motion derivatives and leading to state-of-the-art results on CALVIN and LIBERO benchmarks, as well as improved impedance control in real-world experiments.

Key Contribution

Ditch discrete waypoints: VLA models can now generate smooth, physically plausible robot trajectories by directly regressing continuous action functions.

Abstract

Despite the rapid progress of Vision-Language-Action (VLA) models, the prevailing paradigm of predicting discrete waypoints remains fundamentally misaligned with the intrinsic continuity of physical motion. This discretization imposes rigid sampling rates, lacks high-order differentiability, and introduces quantization artifacts that hinder precise, compliant interaction. We propose Neural Implicit Action Fields (NIAF), a paradigm shift that reformulates action prediction from discrete waypoints to continuous action function regression. By utilizing an MLLM as a hierarchical spectral modulator over a learnable motion prior, NIAF synthesizes infinite-resolution trajectories as continuous-time manifolds. This formulation enables analytical differentiability, allowing for explicit supervision of velocity, acceleration, and jerk to ensure mathematical consistency and physical plausibility. Our approach achieves state-of-the-art results on CALVIN and LIBERO benchmarks across diverse backbones. Furthermore, real-world experiments demonstrate that NIAF enables stable impedance control, bridging the gap between high-level semantic understanding and low-level dynamic execution.

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References25

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models

Related Papers