Search papers, labs, and topics across Lattice.
We track OpenAI, DeepMind, Anthropic, and 17 other labs daily - with AI-powered summaries, trend charts, and a weekly digest.
We read everything so you don't have to. One email, zero noise.
GAAP lets you tell your AI assistant *exactly* what it can and can't share, and then *guarantees* it won't break those rules, even if the AI is compromised.
Forget complex fixed-point machinery: this work offers a dramatically simpler and more efficient route from external regret to $Φ$-regret minimization.
LLMs still struggle to reason in context when cultural and linguistic nuances are involved, achieving only 44% accuracy on a new grounded benchmark spanning 14 languages.
Stop fragmented land cover predictions: SSDM leverages global geospatial embeddings to guide local feature extraction, achieving state-of-the-art performance in high-resolution remote sensing mapping.
DPP-based Monte Carlo integration can offer variance reduction, but choosing the right DPP—fixed vs. tailored to the integrand—determines whether you get a biased but faster converging estimator or an unbiased but standard-rate estimator.
Turns out, cyclic equalizability of two words over any alphabet boils down to a simple check: do they have the same counts of each symbol?
Training-free diffusion models can now harmonize satellite imagery across diverse domains, enabling scalable remote-sensing synthesis without retraining.
LLM agents, like humans, suffer from Actor-Observer Asymmetry, but this work shows how dialectical training can mitigate the bias and improve fault resolution.
Bridging the offline/streaming gap in ASR is now more attainable: a unified RNN-Transducer, trained with mode-consistency regularization, delivers streaming accuracy at low latency while preserving offline performance.
Multi-event video generation gets a 33% quality boost with TS-Attn, a training-free attention mechanism that dynamically aligns video content with complex temporal prompts.
DPC shatters the traditional distributed file system bottleneck by turning a cluster's memory into a single, coherent cache, slashing data redundancy and boosting performance by up to 12.4x.
SpeechLLMs betray their hallucinations through tell-tale attention patterns, enabling detection without needing expensive human-annotated data.
We read everything so you don't have to. One email, zero noise.
Orders-of-magnitude asymmetric collaboration between tiny on-device models and larger cloud models can unlock responsive AI on extremely resource-constrained devices.
TurboQuant's claimed advantages over RaBitQ in quantization don't hold up under rigorous, reproducible comparison, raising questions about its practical utility.
Multilingual LLMs exhibit a surprising "American bias," even when prompted in other languages, and instruction tuning makes it worse.
Generate navigable, photorealistic simulations of real-world cities, complete with consistent weather and lighting, using only geo-registered video data.
Semantic masks can be better than raw pixels for learning robust robot policies by filtering out distracting visual noise and focusing on essential dynamics.
Uncover misleading half-truths by pitting a Politician agent against a Scientist agent in a debate moderated by a Judge, revealing what's left unsaid.
Achieve state-of-the-art person re-identification with only 20% of the data by explicitly teaching the model to "think" before matching identities.
LLMs fix 26% more bugs when given access to intermediate runtime states via simulated debugging, suggesting that outcome-level failure symptoms alone are insufficient for effective automated program repair.
Autoregressive speaker extraction, previously confined to offline processing, can now achieve real-time performance without sacrificing intelligibility thanks to chunk-wise interleaved splicing.
Get the performance boost of expensive sampling-based RL policies for a fraction of the compute by learning to prune action candidates early in the diffusion denoising process.
Entropy regularization makes planning provably easy: SmoothCruiser achieves polynomial sample complexity in MDPs where standard methods fail.
Achieve state-of-the-art CTR prediction without increasing model parameters by recursively reusing shared layers during training.
We read everything so you don't have to. One email, zero noise.
Achieve physically plausible and structurally stable human-object interaction video synthesis with a surprisingly efficient architecture that trains with a dual-stream approach but infers with only the RGB stream.
Freezing a Stable Diffusion backbone and injecting CLIP and BLIP features lets you beat the state-of-the-art in zero-shot sketch-based 3D shape retrieval, without any costly retraining.
Forget chasing the biggest LLM – this benchmark reveals that smaller models (<2B params) can deliver 3x better energy efficiency and faster ROI in real-world industry deployments.
Kernel launch overhead is a bigger bottleneck than you think: GPUOS achieves up to 15.3x speedup by fusing operations at runtime.
Uncover the hidden assumptions baked into LLM responses with a new interactive system that lets you explore alternative conceptual framings and values.
Trustworthy super-resolution in surgery is now achievable, with a model-agnostic method that identifies and mitigates unreliable reconstructions in real-time.
Time-to-collision metrics miss critical collision risk information, but a new 2D acceleration-based metric anticipates collisions far better.
Current red-teaming efforts miss the forest for the trees: ARES reveals that safety failures often stem from a systemic breakdown between the LLM *and* the reward model, not just the LLM itself.
MV-HGNN achieves superior 3D shape retrieval by effectively leveraging geometric dependencies and semantic alignment, outperforming existing methods in zero-shot settings.
ZKP proving, previously bottlenecked by MSM and NTT operations, can now achieve up to 10x higher throughput on TPUs thanks to a novel framework that reformulates ZKP kernels for AI-ASIC execution.
Targeted neuro-symbolic integration can reduce content bias in syllogistic reasoning, achieving over 94% accuracy while cutting content effects by 16%.
RL fine-tuning of discrete diffusion models can be made stable and effective by optimizing on the final denoised sample, unlocking SOTA results on text-to-image generation and OCR.
We read everything so you don't have to. One email, zero noise.
Discrete diffusion models can be sped up by 14x by intelligently choosing which tokens to sample at each step, without sacrificing accuracy.
FUSE achieves verification quality on par with semi-supervised methods, all without needing any labeled data.
Untangling evidence validation from text generation, ArbGraph offers a way to build more reliable long-form RAG systems by explicitly resolving factual conflicts *before* generation even begins.
VLAs can learn to adapt to new environments at test time without any fine-tuning, achieving significant performance gains on robotic manipulation and Atari games.
Multimodal LLMs struggle with multi-digit multiplication, with accuracy plummeting as arithmetic complexity increases, revealing a critical gap in computational capabilities.
Debloating tools, intended to shrink code and improve security, can actually *add* code or remove essential functionality, with dynamic methods being overly aggressive and static methods overly conservative.
Achieve real-time video understanding with transparent reasoning: \model{} aligns response timing with visual evidence, offering a breakthrough for online video LLMs.
Even the best LLMs still stumble on Olympiad-level math, and retrieval quality is the bottleneck for retrieval-augmented problem solving, according to the new MathNet benchmark.
LLMs can reason better over noisy and distributed information when you break down RAG into specialized agent roles for summarization, extraction, and reasoning.
The dream of universal representations across modalities may be just that: scaling up datasets and relaxing constraints reveals that models trained on different modalities learn rich, but fundamentally different, representations of the world.
Multi-agent LLM systems for idea generation can backfire, with smarter models and more communication leading to *less* diverse ideas due to structural coupling.
Allowing multiple support strategies in a single utterance can dramatically enhance the quality of emotional support conversations, leading to more effective dialogue outcomes.
We read everything so you don't have to. One email, zero noise.
Modular training with BAR allows independent updates of domain experts, achieving superior performance without the pitfalls of catastrophic forgetting.
Language models can now learn to forget strategically, achieving 2-3x memory efficiency without sacrificing reasoning accuracy.
Academic paper highlights, often overlooked, can substantially improve unsupervised keyword extraction when combined with abstracts.
Object hallucination in LVLMs can be significantly reduced *after* training, without any extra data or compute.
Q-learning converges faster than previously thought, thanks to a tighter bound derived from a novel stochastic switching system representation of the Bellman error.
Reasoning LLMs can now produce well-calibrated confidence estimates without labels or repeated sampling, unlocking more reliable real-world deployment.
Domain-specific continual pre-training lets a 7B model punch *way* above its weight, beating a 24B generalist on medical tasks by 3.5x.
Current ML model security tools miss nearly half of malicious models because they ignore runtime behavior -- but a new dynamic analysis approach closes this gap.
Achieve robust 3D reconstruction from arbitrary viewpoints and unordered images by explicitly coupling diffusion-based generation with geometric scene understanding.
Test-time training for LLMs can finally scale: interleaving policy refinement with periodic critic recalibration unlocks sustained performance gains and avoids diversity collapse.
LLMs can't reliably automate the creation of executable business workflows, even with agentic assistance, leaving a huge opportunity for improvement.
LLMs can compile GUI code, but they fail spectacularly at generating playable, logically correct applications, highlighting a critical gap in current code generation capabilities.
LLMs are subtly reshaping peer review, making reports longer and more polished, but at the cost of critical depth and focus on originality.
Safe RL and continual learning are often at odds: maintaining safety constraints can lead to catastrophic forgetting in changing environments.
We read everything so you don't have to. One email, zero noise.
Finally, a unified, open-source framework lets you train Vision-Language-Action models from scratch or fine-tune pretrained backbones, achieving state-of-the-art tabletop manipulation performance.
Claims about the robustness and practicality of counterfactual explanation methods for recommender systems don't always hold up when subjected to a unified, comprehensive benchmark.
Naive attention-based filtering for edge-cloud inference is suboptimal under tight bandwidth constraints; prioritizing semantic diversity in transmitted embeddings yields surprisingly large accuracy gains.
Personalized federated learning can now handle the messy reality of heterogeneous industrial data, enabling more accurate failure time predictions across diverse clients.
LLMs struggle to generate obfuscated XSS payloads that reliably preserve runtime behavior, suggesting current models may not be ready for prime time in adversarial security data generation.
Learned critics in RLHF can actually *increase* variance and hurt performance in sparse-reward settings, but a simple explained variance metric can tell you when to ditch the critic and get better results.
By modeling transitions directly in the semantic code space and injecting LLM-verified priors, CAST uncovers latent item complementarity for sequential recommendation with significant performance gains and training acceleration.
CNPs are provably inconsistent with true stochastic processes, but this work shows *how much* they deviate, with a tight $O(1/n^2)$ bound on the conditioning consistency gap.
CKGE benchmarks overestimate performance by up to 25% because they fail to account for "entity interference," a newly identified phenomenon where embeddings of new entities disrupt previously learned relationships.
Node embeddings aren't just about node attributes: proximity and structural features play a surprisingly large role in shaping them.
Uncover hidden performance disparities in your ML models with FairTree, a new auditing tool that pinpoints fairness issues across continuous, categorical, and ordinal features while dissecting bias and variance contributions.
Naive neural operator estimates of solution functional quantities can be significantly biased, but this paper provides a surprisingly simple debiasing technique to fix it.
We read everything so you don't have to. One email, zero noise.
GNN performance on heterophilic graphs suffers because of inductive subgraphs acting as spurious shortcuts, a problem that can be solved by causally disentangling these subgraphs.
Arabic LLMs can speak the language of finance, but they often fail to reason about it, especially when it comes to causality and generation.
Escaping the tyranny of Bellman's curse, a new method leverages multi-step transitions to achieve higher-order accuracy in continuous-time policy evaluation, outperforming traditional one-step recursion.
LLMs can be effectively combined with graph-based methods to capture both semantic and structural information in tables, leading to state-of-the-art performance in table annotation tasks.
Forget relying on implicit reasoning: A-MAR's explicit reasoning plans unlock better artwork understanding by strategically retrieving relevant evidence.
Teaching LLMs to perform arithmetic on images unlocks a new level of grounded reasoning, paving the way for robots that can understand and manipulate the world more like humans.
Generative models for mobility data, previously thought to be private, are vulnerable to membership inference attacks, highlighting the need for more robust privacy evaluations.
Resolving semantic conflicts between synonymous prompts and across categories dramatically improves the stability and accuracy of open-vocabulary semantic segmentation, all without requiring any additional training.
LLMs that ace safety quizzes still fail to avoid hazards in the real world, revealing a dangerous gap between passive recognition and active mitigation.
Forget generic assistants – EgoSelf learns your habits from your first-person view data to predict your future interactions.
Despite growing concerns about data contamination, current black-box methods are essentially useless for detecting if an LLM has been trained on specific copyrighted material.
Noisy multimodal preference datasets are holding back reward model performance, but DT2IT-MRM offers a scalable curation strategy that achieves state-of-the-art results.
We read everything so you don't have to. One email, zero noise.
Achieve 50% parameter reduction in LLaMA-2-7B with minimal performance loss and no fine-tuning, thanks to a new global gating-based structured pruning method.
AI can now provide real-time feedback on coding consistency in qualitative research, previously a manual and drift-prone process.
LLMs struggle to identify relevant legal issues with high precision, but a neuro-symbolic approach using sparse linear models over LLM-generated analytical factors can boost performance by 30-40%.
Forget painstakingly gathering human feedback for image editing models – this framework uses a VLM to automatically score edits and align diffusion models with human preferences.
Stop optimizing generative engines in isolation: MAGEO learns reusable editing strategies that dramatically improve visibility and citation fidelity across diverse engines.
EWS systems can systematically misallocate resources, flagging younger, male, and international students for support at higher rates than their older and female counterparts, even when their actual risk is comparable.
Forget scaling laws: strategically equipping small language models with tools delivers a better performance/cost tradeoff than simply scaling up or deploying multi-agent systems.
Discovering stability and receptivity in complex systems no longer requires known equations, thanks to a new neural operator framework that learns directly from data.
Synchronized aerial imagery unlocks dense, geometrically consistent BEV semantic mapping of dynamic road scenes, even from ego-centric sensors alone.
Distributed ML slashes energy consumption in 6G IoT networks by up to 70% without sacrificing prediction accuracy, offering a greener path forward.
Experimental data can resolve discrepancies in MOF property predictions, with a multimodal transformer leveraging XRD patterns to distinguish between samples sharing the same framework.
LLM agents are surprisingly inept at Capture the Flag challenges, with even the best models only completing 35% of checkpoints, revealing critical gaps in their ability to perform realistic cybersecurity tasks.
We read everything so you don't have to. One email, zero noise.
LLMs aren't just swayed by information, they actively seek social acceptance, making them vulnerable to manipulation in multi-agent settings.
LLM agents can reliably infer each other's "warmth" and "competence" from interaction histories, leading to significantly better coordination in complex multi-agent settings.