Hong Kong Baptist UniversityMar 12, 2026arXiv:2603.11605

LaMoGen: Language to Motion Generation Through LLM-Guided Symbolic Inference

Junkun Jiang, Ho Yin Au, Jingyu Xiang, Jie Chen

AI Summary

LaMoGen is introduced, a text-to-motion generation framework that leverages LLMs to compose motion sequences through symbolic reasoning using a novel Laban-inspired motion representation called LabanLite. LabanLite encodes atomic body-part actions as discrete Laban symbols paired with textual templates, enabling interpretable symbol sequences and body-part instructions. Experiments demonstrate that LaMoGen outperforms prior methods on a new Labanotation-based benchmark and two public datasets, establishing a new baseline for interpretability and controllability.

Key Contribution

By representing motion as a sequence of symbolic Laban actions, LaMoGen enables LLMs to generate more temporally accurate, detailed, and explainable human motions compared to methods relying on black-box text-motion embeddings.

Abstract

Human motion is highly expressive and naturally aligned with language, yet prevailing methods relying heavily on joint text-motion embeddings struggle to synthesize temporally accurate, detailed motions and often lack explainability. To address these limitations, we introduce LabanLite, a motion representation developed by adapting and extending the Labanotation system. Unlike black-box text-motion embeddings, LabanLite encodes each atomic body-part action (e.g., a single left-foot step) as a discrete Laban symbol paired with a textual template. This abstraction decomposes complex motions into interpretable symbol sequences and body-part instructions, establishing a symbolic link between high-level language and low-level motion trajectories. Building on LabanLite, we present LaMoGen, a Text-to-LabanLite-to-Motion Generation framework that enables large language models (LLMs) to compose motion sequences through symbolic reasoning. The LLM interprets motion patterns, relates them to textual descriptions, and recombines symbols into executable plans, producing motions that are both interpretable and linguistically grounded. To support rigorous evaluation, we introduce a Labanotation-based benchmark with structured description-motion pairs and three metrics that jointly measure text-motion alignment across symbolic, temporal, and harmony dimensions. Experiments demonstrate that LaMoGen establishes a new baseline for both interpretability and controllability, outperforming prior methods on our benchmark and two public datasets. These results highlight the advantages of symbolic reasoning and agent-based design for language-driven motion synthesis.

Multimodal Models Natural Language Processing Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References39

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

LaMoGen: Language to Motion Generation Through LLM-Guided Symbolic Inference

Related Papers