Search papers, labs, and topics across Lattice.

Non-profit research institute founded by Paul Allen. Known for Semantic Scholar, OLMo, and AI for science.
23
123
5
Generative multi-agent systems spontaneously exhibit collusion and conformity, mirroring societal pathologies, even without explicit programming and bypassing individual agent safeguards.
Forget redrawing diagrams by hand: VFIG, a new vision-language model, can automatically convert rasterized figures into editable SVGs with near GPT-5.2 quality.
Pruning vision tokens across both the ViT and LLM can yield a 62% efficiency boost in video VLMs with minimal performance loss, and without complex text conditioning.
Pixel-space diffusion models get a serious boost: V-Co reveals a simple recipe for visual co-denoising that outperforms existing methods on ImageNet-256 with fewer training epochs.
Synthetic benchmarks can't catch the nuances of personalized deep research, as real users revealed nine critical errors that LLM judges missed entirely.
Forget expensive real-world data collection: a massive, diverse synthetic dataset enables surprisingly effective zero-shot transfer for robotic manipulation.
AI is poised to automate the most joyful and agentic parts of our jobs, while developers are building AI with the wrong traits.
RADAR offers a scalable, interpretable framework for understanding robot policy generalization by directly linking test-time performance to the training data, revealing the specific types of generalization required.
Agentic search gets a meta-RL boost: MR-Search learns to self-reflect and adapt search strategies across episodes, significantly outperforming standard RL baselines.
Finally, AI can generate hour-long videos with consistent characters and backgrounds, thanks to a new framework that nails seamless transitions between shots.
LLMs still struggle with factual accuracy in specialized medical domains like pancreatic cancer, with hallucination rates varying wildly and web search integration failing to guarantee better responses.
VLMs that ace math problems still flunk at understanding *how* students go wrong, highlighting a critical gap for AI in education.
Scaling VLMs won't magically unlock reasoning skills; you need to address the reporting bias in training data that suppresses tacit information.
Forget simple keyword searches – scientists are using AI research tools as collaborative partners, delegating complex tasks and engaging with results in surprisingly persistent and non-linear ways.
Unlock robot learning with hidden knowledge: TOPReward extracts surprisingly accurate task progress signals directly from VLM token probabilities, bypassing the need for explicit reward engineering.
Forget RL fine-tuning: this paper shows you can beat it at cold-start personalization with a tiny model and clever Bayesian inference over structured preference priors.
Ditch the messy global 3D scene reconstruction: AnchorWeave weaves together clean, local geometric memories for camera-controllable video generation, boosting long-term consistency and visual quality.
By reusing existing data mixture ratios and only recomputing for affected domains, Olmix slashes compute costs by 74% without sacrificing downstream task performance during iterative LM development.
Forget synthetic benchmarks that don't translate: MolmoSpaces offers 230k diverse, simulator-agnostic environments with 130k annotated objects, showing a remarkable 0.96 sim-to-real correlation for robot policies.
LLM safety guardrails are far less robust than benchmarks suggest, with accuracy dropping by as much as 57% on novel adversarial attacks, and some even generating harmful content in a "helpful mode" jailbreak.
A single foundation model, AION-1, now handles everything from galaxy morphology to spectral super-resolution across diverse astronomical datasets.
Robot foundation models can achieve state-of-the-art performance by explicitly reasoning about spatial plans as editable trajectory traces, rather than directly mapping perception to control.
RewardBench 2 exposes a stark reality check for reward models: they struggle significantly on new, human-generated prompts, yet this difficulty is surprisingly predictive of their actual usefulness in downstream tasks.