Search papers, labs, and topics across Lattice.
9
0
7
3
Semantic visual-action tokenization in RepWAM significantly enhances robotic manipulation performance, outperforming traditional reconstruction-based approaches.
Aligning shallow and deep features in representation autoencoders leads to a dramatic improvement in image reconstruction quality, setting new benchmarks in the field.
UniDexTok slashes reconstruction errors by over 98% for dexterous hands, achieving unprecedented accuracy without relying on retargeting.
Reinforcement learning boosts multimodal performance, raising task scores and creating unexpected synergies between image generation and editing.
OmniGen-AR can seamlessly generate images from a wide array of conditions, outperforming existing methods that are limited to single-modality inputs.
Visual features can be the game-changer in recommendation systems, but they鈥檙e often overlooked鈥擱EVEAL flips the script by making them a focal point.
Gradual bridging with embodied trajectory-coupled data transforms VLMs into robust robot control policies, overcoming significant transfer challenges.
ActiveMimic reveals that leveraging active perception from egocentric videos can close the performance gap with robot-pretrained models, transforming how we approach robot learning.
Language models are increasingly doing their real work in the "invisible" latent space, not the tokens we see.