Search papers, labs, and topics across Lattice.
6
67
11
21
Open-source web agents can now outperform GPT-4o on key web navigation tasks, thanks to a new dataset and model family that levels the playing field.
Pruning vision tokens across both the ViT and LLM can yield a 62% efficiency boost in video VLMs with minimal performance loss, and without complex text conditioning.
Forget expensive real-world data collection: a massive, diverse synthetic dataset enables surprisingly effective zero-shot transfer for robotic manipulation.
Unlock robot learning with hidden knowledge: TOPReward extracts surprisingly accurate task progress signals directly from VLM token probabilities, bypassing the need for explicit reward engineering.
Forget synthetic benchmarks that don't translate: MolmoSpaces offers 230k diverse, simulator-agnostic environments with 130k annotated objects, showing a remarkable 0.96 sim-to-real correlation for robot policies.
Robot foundation models can achieve state-of-the-art performance by explicitly reasoning about spatial plans as editable trajectory traces, rather than directly mapping perception to control.