Search papers, labs, and topics across Lattice.
Tongji University, Shanghai, China, Texas A&M University, M+ trajectories; RoboCOIN [26] collects over 180,000 demonstrations for 421 tasks. However, these datasets and tasks often focus on a few common tasks and behaviors. After removing duplicates and categorizing them based on their semantic meanings, most tasks concentrate on very common behaviors such as “pick and hold”, while lacking coverage of complex and long-tail tasks. This singular task design leads to significant biases in the trained models, limiting their applicability in real-world scenarios as pre-trained models, except for a few common tasks. Similarly, current evaluation tasks suffer from analogous issues. Most studies, when proposing new methods, tend to test only on a few common tasks, without a unified task design standard, making fair comparisons across different works difficult. To address these issues, we introduce the Great March 100 (GM-100) as the first step towards a robot learning Olympics. GM-100 consists of 100 carefully designed tasks that cover a wide range of interactions and long-tail behaviors, aiming to provide a diverse and challenging set of tasks to comprehensively evaluate the capabilities of robotic agents and promote diversity and complexity in robot dataset task designs. These tasks are developed through systematic analysis and expansion of existing task designs, combined with insights from human action understanding. We collect a large amount of trajectory data on two different robotic platforms and evaluate several baseline models. Experimental results demonstrate that the GM-100 tasks are 1) feasible to execute and 2) sufficiently challenging to effectively differentiate the performance of various methods. Besides, in the task design process, we do not rely on the utility for real-world tasks as the standard to avoid human bias, but follow the physical common sense and low-level manipulation knowledge (the how-level affordance) as the only standards to generate and select the final tasks. To summarize, in this report, we make the following contributions: • We identify the limitations of existing robot task designs and evaluations, highlighting the need for more diverse and complex tasks. • We propose GM-100, a task list consisting of 100 detail-oriented tasks that cover a wide range of interactions and long-tail behaviors. • We collect a medium-sized dataset on robotic platforms and evaluate several baseline models, demonstrating the challenge and effectiveness of GM-100. Our data and code are available at https://rhos.ai/research/gm-100. 2 Related Work 2.1 Imitation Learning Imitation learning underpins embodied intelligence by teaching agents to map sensory inputs to actions via expert demonstrations. Early methods include Behavioural Cloning [20], interactive aggregation as in DAgger [22], adversarial approaches like GAIL [10]. More recently, diffusion-based policies such as ACT [31], Diffusion Policy [6], and
1
2
3
4
By jointly reinforcing informative visual tokens and suppressing irrelevant ones, DuCAR significantly reduces hallucinations in LVLMs, outperforming prior single-modality focused approaches.