Search papers, labs, and topics across Lattice.
This paper introduces the Evolvable Embodied Agent (EEAgent) framework, which uses VLMs for environment interpretation and policy planning in robotic manipulation. A key component is the Long Short-Term Reflective Optimization (LSTRO) mechanism, which refines prompts based on both short-term and long-term experience. Experiments on six VIMA-Bench tasks demonstrate that EEAgent achieves state-of-the-art performance, particularly in complex scenarios, indicating the effectiveness of the LSTRO mechanism for continuous self-evolution.
Forget reinforcement learning: robots can now evolve manipulation skills simply by reflecting on their successes and failures.
Achieving general-purpose robotics requires empowering robots to adapt and evolve based on their environment and feedback. Traditional methods face limitations such as extensive training requirements, difficulties in cross-task generalization, and lack of interpretability. Prompt learning offers new opportunities for self-evolving robots without extensive training, but simply reflecting on past experiences.However, extracting meaningful insights from task successes and failures remains a challenge. To this end, we propose the evolvable embodied agent (EEAgent) framework, which leverages large vision-language models (VLMs) for better environmental interpretation and policy planning. To enhance reflection on past experiences, we propose a long short-term reflective optimization (LSTRO) mechanism that dynamically refines prompts based on both past experiences and newly learned lessons, facilitating continuous self-evolution, thereby enhancing overall task success rates. Evaluations on six VIMA-Bench tasks reveal that our approach sets a new state-of-the-art, notably outperforming baselines in complex scenarios.