Search papers, labs, and topics across Lattice.

Amazon's research arm covering ML, NLP, robotics, and cloud AI. Drives Alexa, AWS AI services, and logistics optimization.
32
2
0
Memory-augmented LLMs get a strategic upgrade: MemMA uses multi-agent reasoning to proactively guide memory construction and repair, leading to significant performance gains.
LLM-generated survey responses can be statistically accurate yet still miss the option most preferred by humans, highlighting a critical flaw in current evaluation methods.
Agentic LLMs are surprisingly vulnerable: a new framework finds successful attacks in 84% of attempts by escalating prompt injection techniques across multiple stages.
Achieve minute-level navigable video world models by combining the strengths of explicit 3D patch memory with implicit generative modeling.
Achieve near-full light throughput in spectral imaging with a novel oscillating dispersion technique and deep unfolding network, enabling high-fidelity reconstruction even under light-starved conditions.
Achieve 50% bitrate savings in ultra-low-bitrate image compression by cleverly turning image decoding into a next-frame prediction problem using video diffusion priors.
LoRA fine-tuning can significantly boost the voice cloning capabilities of LLM-based TTS systems, but only if the training data is acoustically diverse enough.
LLM reasoning research is inadvertently paving a dangerous path towards AI situational awareness and strategic deception, demanding a re-evaluation of current safety measures.
Recursive self-improvement can boost performance by 18% in code and 17% in reasoning, but only if you can keep it from going off the rails – SAHOO provides the guardrails.
MC3D models can now generalize to unseen camera configurations thanks to a new framework that explicitly accounts for spatial prior discrepancies.
Save 20% on LLM costs with <2% accuracy drop by strategically cascading a small model with a large one, guided by a confidence-calibrated SLM.
LLM-based recommender systems can trigger users' personal trauma, phobias, or self-harm history, but a new framework cuts these safety violations by 96.5% while maintaining recommendation quality.
LLMs can ace math problems while reasoning like a drunk toddler, with 82% of correct answers arising from unstable, inconsistent logic.
Safety classifiers for LLMs can catastrophically fail with even minuscule embedding drift, creating dangerous blind spots in deployed safety architectures.
Despite matching or exceeding human expert performance on generating potential diagnoses, current MLLMs struggle to synthesize multimodal clinical evidence for final diagnosis, revealing a critical gap in their clinical reasoning abilities.
Latent reasoning models often take shortcuts to achieve high accuracy, and stronger supervision, while mitigating this, paradoxically restricts the diversity of their latent representations.
Forget Bonferroni: a new sequential testing approach slashes audit times for multi-stream ML systems, especially when anomalies are widespread.
Stop training your M3OD models on the same old entangled data: this method decomposes and recomposes objects, scenes, and camera poses to generate diverse training examples on the fly, boosting performance without needing more real-world data.
Soft pseudo-labels, theoretically equivalent to hard labels when perfectly calibrated, tank performance in cross-domain semantic segmentation, motivating a new calibration framework.
Forget fine-tuning: inject targeted time-series insights into general LLMs and watch their reasoning skills skyrocket by up to 26%.
Static benchmarks can be fooled by fluent text and aligned citations, but DREAM leverages agentic evaluation to expose the critical capability mismatch in assessing temporal validity and factual correctness of research agents.
Forget costly knowledge graphs: SAGE offers a lightweight, chunk-level graph expansion method that boosts retrieval recall by up to 8.5 points on heterogeneous QA tasks.
LLMs may ace the test, but their uncertainty estimates are far from perfect, raising serious concerns about their reliability in high-stakes educational assessments.
An end-to-end system extracts funny scenes from movies with 87% accuracy, opening new avenues for automated content repurposing.
Stop hand-rolling your multi-task learning to rank models: DeepMTL2R provides a ready-to-use framework with 21 SOTA algorithms and Pareto-optimal optimization.
Give new e-commerce products a warm start by borrowing behavioral signals from their substitutes, boosting search relevance and product discovery.
Object hallucination in MLLMs can be significantly reduced by simply masking salient visual features during contrastive decoding.
MLLMs can now reason about road traffic accidents by fusing remote sensing imagery and structured data, unlocking interpretable insights previously inaccessible to traditional methods.
Pinpointing the root causes of supply chain anomalies just got easier: a Shapley value-based attribution mechanism rapidly decomposes simulation outputs into individual input effects.
AI-generated feedback on student portfolios from GPT-4o and Claude-Sonnet-4 shows promise for high-stakes clinical assessments, but careful evaluation is needed to ensure accuracy and educational value.
LLMs evaluating job candidates exhibit significant bias against hedging language, docking candidates by 25.6% on average, even when the content is equivalent.
Achieve up to 39.6% FLOP reduction in LLM inference without retraining or architectural changes using QuickSilver's dynamic token-level optimizations.