Search papers, labs, and topics across Lattice.
Paul G. Allen School of Computer Science
9
2
11
17
Forget multiple forward passes: this reward model scores all candidate responses at once, unlocking massive speedups and better comparative reasoning.
Forget training on closed sets: WildDet3D leverages geometric cues and diverse prompts to achieve SOTA 3D object detection across 13.5K categories in the wild.
Open-source web agents can now outperform GPT-4o on key web navigation tasks, thanks to a new dataset and model family that levels the playing field.
Ditch the coordinate system: VLMs can point *way* better by directly selecting visual tokens, leading to SOTA results and improved sample efficiency.
Forget redrawing diagrams by hand: VFIG, a new vision-language model, can automatically convert rasterized figures into editable SVGs with near GPT-5.2 quality.
Pruning vision tokens across both the ViT and LLM can yield a 62% efficiency boost in video VLMs with minimal performance loss, and without complex text conditioning.
A new video-based reward model beats GPT-5.2 and Gemini-3 Pro at evaluating computer-using agents, offering a scalable, model-agnostic alternative to traditional methods.
Scaling VLMs won't magically unlock reasoning skills; you need to address the reporting bias in training data that suppresses tacit information.
Ditch slow, external segmentation pipelines: TrajTok learns trajectory tokens end-to-end, boosting video understanding while staying lean and adaptable.