Search papers, labs, and topics across Lattice.
We track OpenAI, DeepMind, Anthropic, and 17 other labs daily - with AI-powered summaries, trend charts, and a weekly digest.
We read everything so you don't have to. One email, zero noise.
By unifying generative and discriminative models, UniGenDet achieves state-of-the-art image generation and detection, proving that the best fakes are made with a deep understanding of what makes them detectable.
Even GPT-5 only achieves 63% accuracy on time series anomaly questions from real software incidents, but a model-expert combination reaches 87%, highlighting the potential for hybrid intelligence in incident response.
Real-world robots can now navigate complex environments with human-level instructions, thanks to a new system that combines efficient perception with high-level reasoning, all while running in real-time on limited hardware.
Point-VLMs can learn to see the world as it really is: targeted reward assignment and cross-modal verification nearly close the reality gap in 3D reasoning.
A surprisingly simple tweak to Hartigan's k-means algorithm unlocks another 2-5% accuracy boost, especially when clustering high-dimensional data.
MEV searchers beware: a new, low-cost DoS attack can cripple transaction bundling services like Flashbots by exploiting inter-transaction dependencies and atomic block inclusion.
LLMs can now directly predict geographic coordinates with high accuracy, even for vague locations and complex regions, bypassing the need for traditional geocoding pipelines.
MLLMs often *hallucinate* the referent of a pointing gesture, latching onto nearby or salient objects instead of truly understanding spatial semantics.
Unlock higher-capacity covert communication with LLMs: a new steganography scheme uses list decoding to substantially outperform existing methods without sacrificing security or efficiency.
Predicting pre-promotion conversions in e-commerce gets a boost with a new model that understands how users "window shop" before sales actually start.
Stop writing incomplete tests: TestGeneralizer can automatically expand your existing tests to cover 31% more scenarios and catch more bugs.
Continual learning for LLM agents hits a wall: scaling models doesn't reliably improve skill generation, and self-feedback can lead to recursive drift.
We read everything so you don't have to. One email, zero noise.
LLMs can reason more effectively by directly tracking their own belief in the correct answer throughout the reasoning process, enabling more targeted policy updates.
Imagine slashing the human effort needed to go from hypothesis to submission-ready ML theory paper by orders of magnitude.
Pocket-sized VLA models can now achieve state-of-the-art robot manipulation performance by pre-training on a curated multimodal dataset and injecting manipulation-relevant representations into the action space.
Automated identification of individual animals can only be effective if it aligns with ecological questions and data practices, not just algorithmic accuracy.
AI-driven summaries of public consultations can systematically exclude dissenting voices, raising concerns about biased policy recommendations even when individual outputs seem reasonable.
A low-cost, compact sensor provides continuous vision-tactile feedback, enabling robots to "see" and "feel" their way through dexterous manipulation tasks.
Directly embedding quantile tokens into input sequences leads to sharper and more accurate distribution predictions, outperforming traditional methods by a substantial margin.
Continuous benchmarking of protein function prediction models is now possible, enabling faster iteration and more robust performance tracking as annotations evolve.
Achieve superhuman dexterity: ALAS unlocks robust long-horizon task completion by decoupling environment understanding from motor control, enabling generalization across diverse human-scene interaction scenarios.
MLLMs still struggle to integrate diverse data for clinical reasoning, as evidenced by their poor performance on a new ophthalmology benchmark spanning image quality assessment to diagnosis.
Deterministic decoding can outperform stochastic self-consistency in constrained domains by systematically exploring high-probability reasoning traces, leading to better performance with less computation.
Current remote sensing change captioning datasets miss fine-grained localized semantic reasoning, but RSRCC fills this gap with 126k change-specific questions.
We read everything so you don't have to. One email, zero noise.
TPGO allows multi-agent systems to learn from their own optimization history, leading to unprecedented self-improvement in performance.
LVLMs can self-detect and correct object hallucinations by focusing on specific image regions, offering a simple, training-free fix.
Stop penalizing your ANN search algorithms for failing to retrieve irrelevant neighbors – Semantic Recall offers a more nuanced and effective way to measure retrieval quality.
Extracting temporal geometry from generative models can boost reinforcement learning performance by over 2x without changing the optimal policy.
Users who actively participate in an AI agent's spreadsheet execution not only improve task outcomes, but also gain a deeper understanding and feel more ownership over the results.
The trajectory of gradient descent is not random; it is systematically forced toward the critical threshold of $2/η$, revealing a hidden structure in neural network optimization.
Sampling plausible configurations of digital twins can reveal multiple valid parameterizations, enhancing model adaptation in complex natural systems.
LLMs are poised to flip the script on personalization, giving users unprecedented control over their data and how it's used across platforms.
Freezing a Stable Diffusion backbone and injecting CLIP and BLIP features lets you beat the state-of-the-art in zero-shot sketch-based 3D shape retrieval, without any costly retraining.
LLMs still struggle to reason in context when cultural and linguistic nuances are involved, achieving only 44% accuracy on a new grounded benchmark spanning 14 languages.
Expert upcycling lets you scale MoEs for 32% less compute by intelligently duplicating and specializing existing experts, challenging the need to train massive MoEs from scratch.
Contact-aware reconstruction transforms how we achieve realistic human-scene interactions in 3D environments, correcting artifacts that have plagued previous methods.
We read everything so you don't have to. One email, zero noise.
Achieve state-of-the-art person re-identification with only 20% of the data by explicitly teaching the model to "think" before matching identities.
Forget chasing the biggest LLM – this benchmark reveals that smaller models (<2B params) can deliver 3x better energy efficiency and faster ROI in real-world industry deployments.
Bridging the offline-streaming gap in ASR is now more achievable: a single RNN-Transducer model can deliver high accuracy in both settings, thanks to a novel consistency regularization technique.
Get the performance boost of expensive sampling-based RL policies for a fraction of the compute by learning to prune action candidates early in the diffusion denoising process.
VLMs can be significantly boosted on embodied tasks by mid-training on a carefully curated subset of VLM data that is highly aligned with the VLA domain, rivaling the performance of much larger models.
TurboQuant's claimed advantages over RaBitQ in quantization don't hold up under rigorous, reproducible comparison, raising questions about its practical utility.
Forget complex fixed-point machinery: this work offers a dramatically simpler and more efficient route from external regret to $Φ$-regret minimization.
Entropy regularization makes planning provably easy: SmoothCruiser achieves polynomial sample complexity in MDPs where standard methods fail.
End-to-end training of Vision-Language-Action models just got a whole lot easier: VLA Foundry unifies LLM, VLM, and VLA training in a single open-source framework.
Multilingual LLMs exhibit a surprising "American bias," even when prompted in other languages, and instruction tuning makes it worse.
LLMs don't see cities neutrally; their perception is skewed towards a culturally uneven baseline, favoring Western perspectives.
LLM agents suffer from a human-like cognitive bias, Actor-Observer Asymmetry, leading them to make inconsistent judgments about their own and others' failures.
We read everything so you don't have to. One email, zero noise.
DPP-based Monte Carlo integration can offer variance reduction, but choosing the right DPP—fixed vs. tailored to the integrand—determines whether you get a biased but faster converging estimator or an unbiased but standard-rate estimator.
Training-free diffusion models can now harmonize satellite imagery across diverse domains, enabling scalable remote-sensing synthesis without retraining.
LLMs, when combined with efficient indexing and noise reduction, can extract actionable insights from noisy customer incident data with high accuracy and low latency at enterprise scale.
Automated expert-level evaluation across 10,000 cases characterised artificial intelligence clinical blind spots hitherto invisible to small-scale testing and should become standard for uncovering serious failures and implementing safety guardrails before clinical deployment exposes patients to risk.
Reshoot dynamic videos from entirely new perspectives with unprecedented realism and control, thanks to a novel 4D point cloud grounding.
Face recognition systems can be fooled by artistic stylization, but StyleID offers a way to train models to see past the style and recognize the person.
LVLMs are often tripped up not by faulty vision, but by over-trusting the textual prompt, leading to surprisingly easy-to-fix hallucinations.
Signal processing offers a surprisingly effective lens for understanding and improving LoRA, the reigning champ of parameter-efficient fine-tuning.
Automatically generate data unit tests that actually catch the data errors that matter for your specific downstream tasks.
Forget polling every user on every idea – this algorithm learns to find common ground by strategically asking for feedback on a few key statements.
Forget philosophical debates: a practical "learning mechanics" is crystallizing to explain *how* deep learning works, not just *why* it should.
Ignoring uncertainty in sequential decision-making disproportionately harms disadvantaged groups, but accounting for it can improve fairness without sacrificing institutional goals.
ML models can accurately predict quantum properties out-of-distribution, but still fail to accelerate SCF convergence – until now.
IoT intrusion detection gets a boost: A-THENA's time-aware encoding and network-specific augmentation beats state-of-the-art methods by up to 6.88% in accuracy, all while running on a Raspberry Pi Zero 2 W.
We read everything so you don't have to. One email, zero noise.
Forget memorizing table headers: TaNOS unlocks surprisingly robust numerical reasoning by pre-training on operation sketches and correctness-guaranteed programs.
PINNs can now efficiently solve highly oscillatory wave equations in heterogeneous media, thanks to a Green's function-based integral formulation that cuts computation by 10x and avoids absorbing boundary layers.
Even when your variational approximation is wrong, symmetries in the target distribution can guarantee you still get the mean right.
LLMs struggle to answer human-generated questions about multi-chart images, highlighting a critical gap in their ability to reason about real-world data visualizations.
Test-time RL's vulnerability to noisy pseudo-labels is amplified by group-relative advantage estimation, but can be mitigated with a surprisingly simple debiasing and denoising approach.
Learnable critics that evaluate the model's own GUI grounding proposals, rather than relying on static geometric heuristics, unlock substantial gains in accuracy.
Ignoring why clinical data is missing can lead to suboptimal treatment policies; this work shows how explicitly modeling informative missingness in multimodal time series data significantly improves both offline treatment policy learning and outcome prediction.
Forget complex architectures: the secret to self-improving LLM agents lies in teaching them how to *interpret* their past failures, not just remember them.
LLM leaderboard rankings are more a reflection of benchmark designer priorities than actual user needs, but a new interactive visualization tool lets you reshape those rankings based on your specific prompt types and goals.
LLMs can be both faster and smarter: pre-learned reasoning skills cut down token usage while boosting accuracy on coding and math problems.
Forget prompt engineering – GROUNDING.md lets you bake domain expertise directly into AI coding agents, ensuring scientific validity even when users aren't experts.
LLMs' apparent success at program repair crumbles when faced with slightly altered versions of known bugs, revealing a reliance on memorization rather than true understanding.
We read everything so you don't have to. One email, zero noise.
AI governance risks becoming performative box-ticking unless practitioners understand how compliance directly improves system quality and user protection.
Counterintuitively, scaling up LLM decoders in speech recognition doesn't guarantee fairness; audio encoder design matters more, as Whisper's pathological hallucinations on Indian-accented speech and repetition loops under masking demonstrate.
LLMs' factual knowledge is surprisingly brittle: simply changing an entity's surface form in a question (e.g., using an abbreviation instead of the full name) can drastically alter the answer.
LLMs may fail in real-world moral decisions because they rigidly adhere to fairness norms, even when their own internal models predict humans would prioritize loyalty.
Deploying language models in the Global South requires bridging the gap between multilingual NLP and edge computing, two fields that have largely evolved independently despite their shared goals.
Mid-sized LLMs can actually be *more* fair in news summarization than their larger counterparts, challenging the common wisdom of "bigger is better."
Even the most advanced LLMs like GPT-5.2 and Gemini-3 stumble on complex optimization problems, achieving only 27% accuracy on a new benchmark spanning stochastic, dynamic, and game optimization.
LLM agent distillation leads to surprisingly high rates of behavioral mimicry, with some student models exhibiting tool-use habits *more* similar to their teachers than the teacher's own family members.
LLMs can significantly boost multi-table entity matching by cleverly coordinating attributes, embedding entities, and pruning noise.
LLMs' impressive code generation skills crumble when faced with the messy reality of ambiguous requirements, highlighting a critical gap in their ability to handle real-world software development scenarios.
Training a video reshooting model on internet-scale monocular videos is now possible, thanks to a clever self-supervision trick that generates multi-view training data from a single video.
Current video Q&A benchmarks can be fooled by textual regularities, failing to actually ground reasoning in the video's physical reality.
We read everything so you don't have to. One email, zero noise.
Turn your 3D Gaussian Splatting failures into features: DualSplat uses initial reconstruction artifacts to bootstrap robust scene representations in the presence of transient objects.
Achieve millimeter-level accuracy in 3D human body fitting from multi-modal inputs, even with scale distortion common in AI-generated assets.
Forget boring ads: this new method uses creative knowledge to generate videos that actually match product features and move realistically.
Unlock real-time, high-quality 3D scene reconstruction from unconstrained images with varying lighting, thanks to a feed-forward Gaussian Splatting model that learns appearance embeddings.
Robot hands get a serious upgrade: embedding cameras in fingertips unlocks robust manipulation in cluttered environments where traditional wrist-mounted cameras fail.
SIMD parallelism can finally unlock substantial speedups in large-number arithmetic by rethinking algorithms around data-parallel operations, yielding up to 19.3% throughput gains in scientific computing.
Bridging the gap between blockchain research and real-world deployment requires navigating recurring design tensions like scalability vs. security, decentralization vs. governance, and privacy vs. compliance.
COFs can withstand defects surprisingly well: mechanical properties remain stable even with defects, but thermal conductivity plummets, revealing design trade-offs.
Existing methods for quantifying molecular rotation break down when motion becomes complex, but this new method accurately captures rotational dynamics from fluid to solid states.
Achieve state-of-the-art sequential recommendations by aligning multi-resolution temporal dynamics with graph propagation at matching scales.
Turns out, the best way to represent tabular data depends heavily on the task at hand, so a one-size-fits-all tabular foundation model may be a mirage.
LLMs can write better stories if they plan the plot on a graph first.
We read everything so you don't have to. One email, zero noise.
Frustrated by researchers struggling to access complex computing resources? This framework offers a practical solution for streamlining onboarding and boosting user success.
Current ICS intrusion detection systems are too fragmented to effectively protect against sophisticated attacks targeting both cyber and physical components.