Search papers, labs, and topics across Lattice.
The paper introduces ProAct, a dual-system framework for embodied social agents that separates low-latency behavioral responses from high-latency cognitive reasoning for proactive behavior. ProAct uses a Behavior System for immediate multimodal interaction and a Cognitive System for long-horizon social reasoning and intention generation. A streaming flow-matching model, conditioned on intentions via ControlNet, translates these intentions into continuous non-verbal behaviors, enabling seamless transitions between reactive and proactive gestures, and user studies on a physical robot demonstrate improved proactivity, social presence, and engagement compared to reactive systems.
ProAct enables embodied social agents to exhibit more engaging and proactive behavior by decoupling reactive responses from slower, deliberative reasoning, leading to improved user perception of proactivity and social presence.
Embodied social agents have recently advanced in generating synchronized speech and gestures. However, most interactive systems remain fundamentally reactive, responding only to current sensory inputs within a short temporal window. Proactive social behavior, in contrast, requires deliberation over accumulated context and intent inference, which conflicts with the strict latency budget of real-time interaction. We present \emph{ProAct}, a dual-system framework that reconciles this time-scale conflict by decoupling a low-latency \emph{Behavioral System} for streaming multimodal interaction from a slower \emph{Cognitive System} which performs long-horizon social reasoning and produces high-level proactive intentions. To translate deliberative intentions into continuous non-verbal behaviors without disrupting fluency, we introduce a streaming flow-matching model conditioned on intentions via ControlNet. This mechanism supports asynchronous intention injection, enabling seamless transitions between reactive and proactive gestures within a single motion stream. We deploy ProAct on a physical humanoid robot and evaluate both motion quality and interactive effectiveness. In real-world interaction user studies, participants and observers consistently prefer ProAct over reactive variants in perceived proactivity, social presence, and overall engagement, demonstrating the benefits of dual-system proactive control for embodied social interaction.