Search papers, labs, and topics across Lattice.
The paper introduces Neural Implicit Action Fields (NIAF), a novel approach for Vision-Language-Action (VLA) models that replaces discrete waypoint prediction with continuous action function regression. NIAF uses a Multimodal Large Language Model (MLLM) as a hierarchical spectral modulator over a learned motion prior to synthesize infinite-resolution trajectories. This continuous representation enables analytical differentiability, allowing for explicit supervision of motion derivatives and leading to state-of-the-art results on CALVIN and LIBERO benchmarks, as well as improved impedance control in real-world experiments.
Ditch discrete waypoints: VLA models can now generate smooth, physically plausible robot trajectories by directly regressing continuous action functions.
Despite the rapid progress of Vision-Language-Action (VLA) models, the prevailing paradigm of predicting discrete waypoints remains fundamentally misaligned with the intrinsic continuity of physical motion. This discretization imposes rigid sampling rates, lacks high-order differentiability, and introduces quantization artifacts that hinder precise, compliant interaction. We propose Neural Implicit Action Fields (NIAF), a paradigm shift that reformulates action prediction from discrete waypoints to continuous action function regression. By utilizing an MLLM as a hierarchical spectral modulator over a learnable motion prior, NIAF synthesizes infinite-resolution trajectories as continuous-time manifolds. This formulation enables analytical differentiability, allowing for explicit supervision of velocity, acceleration, and jerk to ensure mathematical consistency and physical plausibility. Our approach achieves state-of-the-art results on CALVIN and LIBERO benchmarks across diverse backbones. Furthermore, real-world experiments demonstrate that NIAF enables stable impedance control, bridging the gap between high-level semantic understanding and low-level dynamic execution.