Search papers, labs, and topics across Lattice.
100 papers published across 7 labs.
Autonomous coding agents can now outperform expert-engineered attention kernels on NVIDIA's latest Blackwell GPUs, discovering optimizations that eluded human experts.
LMs can learn to generate multiple plausible answers in a single forward pass, outperforming traditional single-answer models on tasks requiring distributional reasoning and offering a compute-efficient alternative to best-of-k sampling.
A compact masked diffusion model can rival multi-billion parameter models in a morphologically rich language like Turkish, challenging the assumption that bigger is always better.
Unlock the potential of full-duplex speech language models with Sommelier, a new open-source pipeline that tackles the messy reality of multi-speaker conversations.
Stop relying on brittle classifiers: SEAR uses LLM reasoning and a unified SQL query layer to evaluate, route, and explain decisions in LLM gateways.
Autonomous coding agents can now outperform expert-engineered attention kernels on NVIDIA's latest Blackwell GPUs, discovering optimizations that eluded human experts.
LMs can learn to generate multiple plausible answers in a single forward pass, outperforming traditional single-answer models on tasks requiring distributional reasoning and offering a compute-efficient alternative to best-of-k sampling.
A compact masked diffusion model can rival multi-billion parameter models in a morphologically rich language like Turkish, challenging the assumption that bigger is always better.
Unlock the potential of full-duplex speech language models with Sommelier, a new open-source pipeline that tackles the messy reality of multi-speaker conversations.
Stop relying on brittle classifiers: SEAR uses LLM reasoning and a unified SQL query layer to evaluate, route, and explain decisions in LLM gateways.
LLMs' temporal reasoning crumbles in low-resource languages and rarer calendar formats, not due to a lack of reasoning ability, but because poor tokenization fragments dates and times.
Linear classification, a cornerstone of machine learning, is provably harder than we thought in high dimensions.
Forget static model averaging: dynamically weighting ensembles based on empirical performance can significantly boost accuracy and interpretability in financial loan default prediction.
Unsupervised phoneme discovery from self-supervised speech models is surprisingly viable, but language-specific challenges remain a significant hurdle.
By enforcing graph isomorphism across counterfactual inputs, UGID reveals that debiasing LLMs can be achieved by directly manipulating internal representations and attention mechanisms.
Agentic Business Process Management offers a blueprint for aligning AI agents with organizational goals, moving beyond simple automation to a framework of constrained autonomy.
Unlock automated health literacy assessment from clinical notes with HEALIX, the first publicly available dataset of its kind.
Scale up offline policy training for diffusion LLMs without breaking the bank: dTRPO slashes trajectory computation costs while boosting performance up to 9.6% on STEM tasks.
By mimicking the brain's "global workspace," MANAR achieves linear-time attention without sacrificing performance, offering a drop-in replacement for standard attention that's both faster and potentially more creative.
Cross-lingual alignment can actually *hurt* transfer learning performance because aligning embeddings doesn't necessarily help with the downstream task.
LLM-generated survey responses can be statistically accurate yet still miss the option most preferred by humans, highlighting a critical flaw in current evaluation methods.
Skip annotating image rationales: this method transfers text-based rationales to images for explainable crisis classification, saving annotation effort while boosting performance.
Control LLMs without retraining: pinpointing just a few key neurons lets you steer outputs more reliably than attribution methods.
Automating web data integration for expert querying is now possible: SODIUM-Agent achieves a 2x accuracy boost over existing systems on a new benchmark of 105 real-world tasks.
Unleashing an LLM's inner creativity or laser-sharp logic is now as simple as turning a knob, thanks to a new distribution-matching method that avoids heuristic rewards.
Naive fine-tuning leads to catastrophic forgetting, but combining replay-based and parameter isolation strategies can actually *improve* performance over joint training in continual learning for intent classification.
Ditch one-hot vectors: representing facial action units as natural language unlocks more realistic and nuanced facial expression synthesis, especially when dealing with conflicting muscle movements.
LLMs can maintain generation quality in long-context scenarios while using significantly less context, simply by adaptively allocating context based on uncertainty.
A peer-like social robot can effectively augment literacy tutor support for newcomer children, offering personalized language and cultural learning in resource-constrained community settings.
Forget comparing models with benchmarks – mapping them by prompt-response likelihoods reveals hidden relationships between architecture, training data, and even how prompts compose.
Instruction-guided video editing can achieve impressive zero-shot performance simply by pre-training on motion-centric video restoration tasks *before* fine-tuning on paired editing data.
Open-source LLMs, when carefully prompted with representative examples, can rival or even surpass smaller commercial models like GPT-3.5-nano in resume screening tasks, offering a privacy-preserving alternative.
Hypergraph modeling of patient visits, coupled with contrastive pre-training, significantly boosts medication recommendation accuracy and safety by capturing complex relationships missed by traditional graph-based approaches.
You can predict how engaged and attracted viewers are to a video lecture just by analyzing the speaker's face and voice, no audience data needed.
VLMs can now better detect when they're seeing something they shouldn't, even as the world changes around them, thanks to a new method that dynamically fuses visual and textual cues.
ChatGPT's geographic reasoning can be surprisingly brittle, with minor syntactic changes causing significant output variations and task composition revealing unexpected distributional shifts.
Get faithful and plausible natural language explanations for chest X-rays with as few as 5 human-annotated examples per diagnosis, and even boost classification accuracy in the process.
Multilingual embeddings just got a whole lot smaller and faster, with F2LLM-v2 models outperforming larger counterparts while supporting over 200 languages.
Achieve fairness without sacrificing accuracy: this post-processing ensemble method boosts fairness across diverse tasks and models.
LLMs aren't just regurgitating facts; they're actually better at generating high-quality, relation-preserving word analogies than humans.
Proactive VideoLLMs can finally be both accurate AND efficient thanks to a novel propose-match framework that decouples semantic understanding from streaming perception.
LLMs understand your intent better when you structure your prompts with "who, what, when, where, why, how, how much, and how many," but only if you present it in natural language, not raw JSON.
Despite the hype, AI decision aids have had surprisingly little impact on actual judicial decisions, revealing a critical gap between algorithmic potential and real-world application.
LLMs can introspect on their own internal emotive states during conversations with surprising accuracy, opening a new avenue for monitoring and influencing their behavior.
Citation-grounded supervised fine-tuning slashes hallucination rates to zero in encoder-decoder models, proving that explicit citation mechanisms are a potent tool for factual accuracy in dialogue systems.
Language learners find that Duolingo's general lessons are great for building a foundation, but personalized, work-related scenarios are key to achieving professional fluency.
AI-mediated video calls erode trust and confidence, even though they don't actually make people worse at spotting lies.
Forget months of manual coding: AutORAN lets you build and deploy O-RAN xApps from natural language in minutes.
Stop shoehorning ideology into a left/right box: this framework lets you model complex belief systems as interconnected networks of concepts, revealing hidden relationships in social discourse.
Human oversight can be systematically integrated into LLM-based text generation to improve accessibility, creating a traceable and auditable process.
Forget expensive multilingual annotations: this framework lets you evaluate LLMs in new languages by transferring knowledge from English, with surprisingly strong results.
Forget fixed decoding strategies – RL can learn a lightweight policy to adapt LLM sampling *at test time*, boosting summarization quality by up to 88% without retraining the LLM.
RAG systems can achieve state-of-the-art performance by explicitly preserving document topology, outperforming LLM-based chunking while simultaneously reducing token overhead.
AI can now handle the tedious copywriting and real-time Q&A for live-streaming commerce, freeing up human streamers to focus on engagement.
Learning from ranked preferences alone can be surprisingly difficult: even with access to the full ranking of actions, standard online learning guarantees break down unless the environment is sufficiently stable.
GenAI terms of service make you solely responsible for your AI's outputs, even though you have no control over how the model works.
AI washing isn't just a marketing problem; it actively harms corporate green innovation, especially for smaller players in competitive markets.
Ditch the finetuning: this training-free method uses attention scores to generate rare concepts in images with more precision and control than LLM-guided approaches.
Navigating the maze of differentially private graph release methods just got easier: a new framework helps practitioners choose the right approach, avoid common pitfalls, and make sound evaluations.
Phishing detectors, despite near-perfect accuracy, crumble under budget-constrained attacks that exploit a handful of low-cost features, revealing a critical vulnerability in real-world deployment.
Forget uniform weighting: the Exponentially Weighted Signature lets you inject temporal context and richer memory dynamics into path representations.
Stop guessing how much to pretrain vs. specialize your language model – scaling laws can now tell you the optimal compute allocation for maximizing performance on downstream tasks.
Discrete diffusion models can now generate more diverse text without sacrificing quality, thanks to a new decoding method that explicitly optimizes for diversity during beam search.
Random projections in continual learning don't have to be random: carefully guiding them with target-aligned data beats the SOTA.
Discovering hierarchical structure in sequential data is now tractable, thanks to a new model that learns online without supervision.
Spectral GNNs' purported spectral advantages for node classification are illusory; their performance actually hinges on their underlying MPNN structure, debunking the "graph Fourier transform" narrative.
LLMs in a group Turing Test still make tell-tale mistakes that betray their AI origins, even when their language skills are otherwise convincing.
Greedy off-policy learning, optimal in theory, can fail spectacularly when supplies are limited, but a simple fix—prioritizing items with high *relative* reward—can restore performance.
EWC, a classic method for continual learning, has been underperforming because it suffers from gradient vanishing and protects the wrong parameters – but a simple "Logits Reversal" trick fixes both.
Transformers can nail in-context learning for regression even when the data is a mess of non-Gaussian noise, heavy tails, and non-i.i.d. distributions, outperforming classical estimators.
Low-resource language models can get a major boost in translation quality and tokenization efficiency by using reinforcement learning to directly enforce structural constraints like sequence length and linguistic well-formedness during training.
Humans get a creativity boost from random analogies, but LLMs are already so creative that the same trick doesn't help—unless you make the analogy really, really weird.
A snapshot of the cutting-edge research uniting Theory of Mind and AI, all in one open-access collection.
Chain-of-Thought prompting can reduce LLM bias against African-American English, but only if you pick the right model.
LLMs penalize informal language in essays so severely that it's like marking a B+ down to a C+, even when explicitly told to ignore writing style.
LLMs, when used to annotate social media for human values, systematically overestimate "Openness to Change" compared to human experts, revealing a potential bias in automated value detection.
Supervised learning models can reliably outperform widely-used commercial AI text detectors, even across different languages and specialized domains like mental health.
Language model text is detectable because it misses the "long tail" of human word choice, not because it's less intelligent.
AI's attempts to provide support in online health communities can backfire by inappropriately conforming to, or outright violating, established community norms.
Move over, topic models: this method discovers functional text categories like "courtroom cross-examination" and "lyrical meditation" by learning how text *does*, not just what it's *about*.
Overstating AI capabilities in fintech erodes trust and hinders digital financial inclusion among farmers, particularly those lacking strong social networks.
Forget struggling with cryptic SQL: a new LLM fine-tuned with human preferences generates comments so good, they beat Qwen3-14B by up to 13% on standard metrics.
Escape the scripted feel of simulated conversations: Interplay trains independent user and recommender LLMs that interact in real-time, without pre-defined target items, for more realistic and diverse conversational recommendation data.
Prompting language significantly impacts the accuracy and coherence of LLM responses for maternal health queries in Telugu, with GeminiAI favoring English prompts and Perplexity AI preferring Telugu.
Stop prompt injections cold: PCFI's priority-aware runtime defense intercepts all attacks in testing with zero false positives and negligible overhead.
Aligning covariates across RCTs and observational studies via calibrated embeddings dramatically improves treatment effect estimation, especially when dealing with nonlinear relationships where traditional imputation struggles.
Stop retrieving background noise: HCQR refines RAG by generating targeted queries that seek evidence to directly support or refute candidate answers.
The chaos of MTSAD research gets a little tamer with a new taxonomy that exposes the field's hidden convergence on Transformers and reconstruction, hinting at where the next breakthroughs will come from.
LLMs can orchestrate complex wireless communication optimization tasks by translating natural language intent into actionable spatial constraints, enabling gradient-based solvers to outperform traditional methods without requiring domain-specific fine-tuning.
The crucial difference between "Human-in-the-Loop" and "Human-on-the-Loop" isn't *where* the human is, but *how* their involvement causally shapes the AI's decisions.
LLMs still struggle to reason about financial time-series data, even when they ace the textual fundamentals.
Multilingual question answering is harder than you think: even state-of-the-art RAG systems stumble when dealing with questions and knowledge in multiple languages.
Men and women see AI's impact very differently, with implications for how we teach ethics to future AI developers.
By recasting attention as a cooperative game and a statistical physics system, NeuroGame Transformer captures higher-order token dependencies, outperforming standard pairwise attention mechanisms.
LLM watermarks can now survive fine-tuning, quantization, and distillation thanks to a new method that embeds them in a stable functional subspace.
LLMs beat traditional metrics at judging PDF table extraction quality, finally offering a way to evaluate semantic correctness, not just structural similarity.
ChatGPT-4o-mini can spot design discussions in code repositories better than other models, offering a new path to automatically surfacing valuable context for software engineers.
LLMs aren't just better tools; they're forcing us to rethink the very nature of information, knowledge, and meaning in system design.
Current AI safety filters can't tell a joke from a threat, especially when humor relies on cultural context – this new benchmark exposes that blind spot.
LLMs can be actively trained to master specific knowledge domains with 50% less data and computation by focusing on what they *don't* know, not what they already do.
AI career coaches can boost short-term goal progress not just through reflection, but by making users feel more socially accountable.
Unlock the power of your favorite classifier for ordinal data: Classifier Pooling consistently beats standard methods, especially when data is scarce or categories are numerous.
Forget static embeddings: this paper shows how modeling scientific concepts as evolving complex networks reveals surprising connections between conceptual change and network topology.
Teaching LLMs to say "I don't know" is now possible via targeted SFT, slashing hallucination rates without sacrificing performance on other tasks.
LLMs can extract consistent, multidimensional semantic information directly from the phonological structure of language, revealing a non-arbitrary relationship between sound and meaning.