Search papers, labs, and topics across Lattice.
School of Computer Science and Engineering, Sun Yat-sen University, China, Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, China, Shenzhen Loop Area Institute, China
10
0
10
FRAP achieves substantial improvements in performance estimation under distribution shifts by effectively merging the strengths of foundation and base models.
Mamba strikes again, enabling VLA models to learn more robust manipulation policies that generalize better to real-world scenarios and require less training data.
Generative models can be surprisingly effective for texture filtering when fine-tuned with a two-stage supervised and reinforcement learning approach.
Multi-frame monocular scene flow estimation gets a serious boost with RAFT-MSF++, which uses Geometry-Motion Feature fusion to achieve state-of-the-art results and improved robustness to occlusions.
Forget trajectory forecasting – TacticGen generates *adaptable* football tactics, bridging the gap between predicting what *will* happen and prescribing what *should* happen to win.
Generating coordinated bimanual grasps on diverse objects is now possible thanks to a dataset of nearly 10 million grasps and a model that adapts to object geometry and size.
LLMs can maintain generation quality in long-context scenarios while using significantly less context, simply by adaptively allocating context based on uncertainty.
Achieve 85% zero-shot dexterous grasping success on unseen robot hands by learning a universal policy aligned to hand morphology, blowing away prior work by nearly 60%.
End-to-end autonomous driving can ditch expert demonstrations and still achieve state-of-the-art performance, thanks to a risk-aware world model that learns to predict and avoid hazardous outcomes.
Forget ad-hoc VLA design: here are 12 key ingredients, validated in a unified framework, for building performant Vision-Language-Action models.