Search papers, labs, and topics across Lattice.
100 papers published across 6 labs.
WER hides the real story: new metrics reveal how language model rescoring in ASR impacts grammatical correctness and semantic accuracy.
Functional logic programs can be efficiently implemented in purely functional languages like Haskell, achieving performance gains over existing Curry compilers by using a novel monadic interface with memoization.
Current ASR metrics, even those leveraging embeddings, fail to align with human perception of transcription quality, as revealed by a new human-annotated dataset.
Model rankings on standard benchmarks can flip entirely when you optimize prompts for each LLM, so your "best" model might actually be the worst.
Get the best of both worlds: Linear-Core Surrogates offer the fast optimization of smooth losses and the statistical efficiency of margin-based losses, without sacrificing differentiability.
Tree-based RAG gets a major upgrade: $\Psi$-RAG's adaptive hierarchical index and multi-granular retrieval agent leapfrog existing methods on complex, cross-document reasoning tasks.
Speaker embeddings leak script information, especially when projecting Western voices into Indic scripts, but LASE fixes this with a language-adversarial training objective.
LLM-powered data augmentation combined with rule-based pre-processing unlocks surprisingly high NER accuracy in low-resource domains, even with limited training data.
Token-aware clustering and hierarchical indexing can slash retrieval latency by an order of magnitude without sacrificing accuracy, making multivector retrieval practical at scale.
Finally, a single framework tackles the Gordian knot of intersectional, multiclass fairness by unifying disparate fairness notions under a mutual information umbrella.
Unlocking interpretable clinical forecasting: StructGP recovers causal relationships and patient progression patterns directly from irregular EHR data, outperforming black-box methods in accuracy and uncertainty calibration.
TEA Nets reveal that LLMs express sadness with lower emotional intensity than humans in psychotherapy contexts, highlighting potential limitations in their ability to simulate genuine emotional responses.
See how LLMs' stances on vaccines, disinformation, and gender equality shift when they "become" different people, thanks to a new dataset of 190,000 persona-driven debates.
Multi-agent workflows can produce correct answers despite significant internal divergence caused by information contamination, revealing a critical blind spot in current verification methods.
Stop wasting compute on fine-tuning datasets with hidden capability gaps: GoalCover lets you diagnose and fix them *before* training.
Stop wasting compute on uninformative node types: TypeBandit intelligently allocates sampling resources in heterogeneous graphs, boosting attribute completion accuracy without architectural changes.
Understanding the scale, duration, and modality of classroom interaction research can unlock insights into what's truly actionable in education.
Even with emotion-aware prompting, today's best small language models still struggle to preserve subtle emotional nuances when translating between languages.
For AI agents needing reliable facts and stateful computation, *how* you structure memory beats simply scaling retrieval or model size.
Forget hard-coded agents: dynamically generated personas could unlock more efficient and personalized multi-agent workflows.
LLM agents can signal rising clinical concern *before* they hit a critical threshold, offering a crucial window for human intervention.
Google's AI Overviews favor Google-owned content and penalize sites blocking its AI crawler, raising serious questions about fairness and bias in the emerging generative search landscape.
LLMs can generate recommendations up to 3.1x faster by explicitly modeling token position within items and speculation depth during speculative decoding.
Model rankings on standard benchmarks can flip entirely when you optimize prompts for each LLM, so your "best" model might actually be the worst.
Surprisal theory's reliance on arbitrary tokenization schemes undermines its validity, but this framework offers a way to fix it.
LLMs can accurately recall constraints while simultaneously violating them, with "knows-but-violates" rates ranging from 8% to 99%, revealing a fundamental flaw in multi-turn ideation.
LLMs reveal that research data is being reused far more often than previously thought, suggesting open science's impact is bigger than we realized.
LLMs can have their personalities surgically altered by tweaking just 0.5% of their neurons, preserving general capabilities while achieving competitive control.
Ignoring language-specific structure in scene-text captioning is a recipe for disaster in tonal languages like Vietnamese, but a new graph framework leveraging phonological attention can help.
LLMs can identify language ideologies even in low-resource languages like Luxembourgish, offering a new tool for understanding identity construction in multilingual societies.
Forget training LLMs to understand privacy policies – a specialized, expert-annotated dataset and hybrid framework can do it better, achieving superior readability and reliability.
WER hides the real story: new metrics reveal how language model rescoring in ASR impacts grammatical correctness and semantic accuracy.
ChatGPT for Clinicians, not human doctors, currently achieves the highest scores on a new benchmark of real-world clinical LLM tasks.
Syntactic structure guides information maintenance during sentence comprehension, and readers who invest more in this maintenance are better positioned to leverage predictability.
Despite its simplicity, mean pooling works surprisingly well because modern text encoders concentrate token embeddings, preserving crucial information about their distribution.
Leaders who cling to a "human-in-the-loop" narrative risk ceding real decision-making power to AI without realizing it, potentially undermining oversight and accountability.
Emotionally charged clickbait can now evade detection by existing systems with up to a 30% higher success rate, thanks to a new generation technique that optimizes for Valence-Arousal-Dominance.
Stop wrestling with messy social media datasets: this toolkit streamlines standardization, anonymization, and enrichment, unlocking cross-platform insights with ease.
Uncovered: news consumption rhythms follow a predictable hierarchy, from daily cycles to split-second actions, but historical interests still dominate user behavior.
YouTube's recommendation algorithm pushes Kyrgyz children towards Russian-language content, even when they signal a preference for their native tongue, effectively amplifying colonial influence.
YouTube's recommendation algorithm doesn't just show different political content to male and female-coded profiles, it steers them into structurally different information ecosystems.
Your AI chatbot conversations aren't as private as you think: most leak conversation content and user identity to third-party trackers.
Over-reliance on AI is demonstrably linked to weaker academic skills in college students, particularly in research and writing.
LLM reading assistants don't need to hallucinate to be harmful; they can subtly steal the user's interpretive labor, even when designed with "epistemic guardrails."
Fears of a Bitcoin price crash due to Satoshi Nakamoto's potential coin dump are likely overblown, with analysis suggesting a maximum 10% price impact even in a worst-case liquidation scenario.
Template engine bugs often manifest as silent failures with unexpected or blank outputs, and fixing them frequently requires changes to host-side logic, not just the template itself.
Watermarking LLMs doesn't have to sacrifice privacy: VOW lets you verify machine-generated text without revealing the content to a central authority.
Instruction tuning on a new dataset, SecGoal, allows smaller 7B/9B parameter models to outperform much larger LLMs in extracting and formalizing security goals from protocol documents.
Sender-anonymity in quantum secret sharing is now possible, thanks to a clever combination of permutation-invariant codes and anonymous quantum transmission.
Newcomers beware: the odds of your "good first issue" pull request getting merged have plummeted nearly 20% in the last year.
Functional logic programs can be efficiently implemented in purely functional languages like Haskell, achieving performance gains over existing Curry compilers by using a novel monadic interface with memoization.
Current open-world semi-supervised learning methods fall short in practical applications because they fail to extract latent semantic information, but SECOS overcomes this by directly predicting textual labels from a candidate set, achieving state-of-the-art results.
By explicitly aligning image features with the hierarchical structure of radiology reports, RIHA generates more clinically accurate and coherent reports than models that treat reports as flat sequences.
Forget task-specific architectures: Uni-HOI uses a unified framework with LLMs to jointly model text, human motion, and object motion, enabling strong performance across diverse HOI tasks.
Successfully converting accents requires balancing accent modification with speaker identity preservation, a challenge that this survey unpacks by tracing the evolution of techniques from DSP to neural methods.
Stuttering isn't random: you can predict severe blocks and sound repetitions from just 3 seconds of audio with a tiny model that runs on your phone.
LLMs can guide phoneme editing to create synthetic accented speech from just a handful of examples, substantially improving ASR accuracy where training data is scarce.
Integrating visual cues into a long-context ASR system slashes word error rate by 16% in multi-talker conversational recordings, proving the power of AV fusion.
Unbury speech from cinematic sound effects by teaching the model to "listen" for how words are formed.
Stop drowning your MLLMs in irrelevant document noise: FES-RAG shows that carefully selecting multimodal fragments as evidence boosts performance by up to 27% while shrinking context length.
AI research agents can now reliably trace method evolution topologies thanks to a new methodological evolution graph, Intern-Atlas, that captures structured relationships between research methods.
Inaccessible identity verification isn't just an inconvenience for blind and low vision users; it fundamentally reshapes how they achieve security and access essential government services.
Forget Shakespeare, LLMs can now sling verses in Arabic dialects, thanks to a new dataset for instruction-guided poetry generation.
Gradient cancellation during fine-tuning can be tamed by simply scaling down the gradients of correctly classified examples, leading to more stable and accurate models.
LLMs still struggle to go beyond simple lookups when answering questions about tables, especially when prediction and reasoning about unobserved data is required.
Accurately predicting Alzheimer's progression just got a major boost: PROMISE-AD uses longitudinal data and a Transformer-based survival framework to achieve state-of-the-art performance in forecasting conversion from cognitively normal to MCI and MCI to AD.
A single KL identity unlocks a surprisingly simple and unified derivation of core results for exponential families, streamlining the theoretical foundations of variational inference, entropy-regularized RL, and RLHF.
Federated learning can overcome data silos, but struggles when clients have different label relationships; FedHarmony shows how to harmonize these differences, leading to better performance.
LLMs can beat traditional time-series models by orchestrating specialized agents in a dynamic workflow, iteratively refining forecasts with memory and ensemble methods.
LMs can now selectively abstain from answering with provable guarantees, thanks to a new method that uses representation geometry to better gauge when they're out of their depth.
TwinGate stops jailbreaks by tracking malicious intent across anonymized, interleaved queries with minimal overhead, something previous defenses couldn't do.
LLMs' ranking instability, where shuffling candidates changes recommendations, can be solved with a novel architecture that enforces permutation invariance.
Forget unreliable forecasts: CircuITS offers structurally guaranteed valid joint distributions for irregular multivariate time series, outperforming existing methods in joint and marginal density estimation.
Self-supervised encoders implicitly perform soft clustering on a "predictive manifold" in probability space, and this geometric perspective yields a practical Gaussian regularizer (SIGReg) competitive with variational IB.
Get the best of both worlds: Linear-Core Surrogates offer the fast optimization of smooth losses and the statistical efficiency of margin-based losses, without sacrificing differentiability.
Uncover hidden drivers of disparity: pinpoint the specific combinations of characteristics that explain outcome gaps between populations.
Expert imbalance can cripple learning-to-defer systems, but a novel cost-sensitive margin-based loss function can restore performance.
Imagine a Pokemon TCG where every card is uniquely yours, dynamically generated by AI to reflect your playstyle and preferences.
Dialogue models can anticipate user intents and reduce redundant turns simply by injecting a lightweight intent-transition prior into the system prompt.
Real-world Text-to-SQL systems can now be continuously evaluated and improved in production, even without access to database schemas or ground-truth queries.
LLMs can prune noisy edges in EEG graphs, leading to more accurate and interpretable seizure detection.
AI sign language translation tools, despite their promise, may actually reinforce ableism by prioritizing technical standardization over the cultural and linguistic nuances of Deaf communication.
Text-to-SQL models can get a 36% accuracy boost and run 2.2x faster by exploiting the predictable patterns in real-world query workloads.
Domain knowledge, usually helpful, can actually *hurt* LLMs tackling complex engineering design modularization, revealing a fundamental tension between semantic priors and structural optimization.
LLMs are rapidly transforming peer review, but critical gaps remain in ensuring quality, fairness, and ethical considerations across the entire workflow.
Forget scaling up data volume: repeating a smaller, high-quality German dataset yields superior language models compared to single-pass training on a larger, less filtered corpus.
Achieve state-of-the-art multimodal stance detection by having multiple AI agents debate each other, complete with retrieval-augmented context and self-reflection.
Achieve detailed tunnel defect inspection without any training by visually recalibrating foundation model proposals to overcome tunnel-specific interference.
Claims of human-like cognition in models like CENTAUR crumble under LAPITHS, a framework that reveals these models' performance can be replicated by systems lacking cognitive plausibility.
Forget manual skill annotation: Ctx2Skill lets language models teach themselves to master complex contexts, unlocking better reasoning without human intervention.
People judge healthcare AI based on communication quality and perceived human oversight, not just abstract trust or technical performance.
Fine-grained reward modeling, achieved by selectively dropping instruction requirements, unlocks substantial improvements in writing-centric generation tasks.
LLMs can achieve better zero-shot product ranking with 57% less token usage by reasoning over structured attribute graphs instead of raw text.
Turns out, arranging words to minimize syntactic dependency distance in sentences with star-like structures is easier than we thought, suggesting other factors drive word order.
Recipes, like languages, exhibit universal statistical laws governing their structure, suggesting a deeper, shared cognitive basis for creative expression across cultures.
LLMs can achieve state-of-the-art coreference resolution in task-based dialogue by reasoning over object metadata at test time, even outperforming supervised methods in cross-domain generalization.
LLMs beat word counts for predicting mental health from therapeutic writing, proving that *how* you tell a story matters more than *what* words you use.
Explicitly diagnosing what's missing from a retrieval set unlocks substantial gains in long-term conversational memory, boosting accuracy on temporal and multi-hop questions by up to 20% while simultaneously reducing latency.
General American English ASR performance doesn't guarantee similar accuracy across other English accents, as revealed by a new multi-accent call center dataset.
Current ASR metrics, even those leveraging embeddings, fail to align with human perception of transcription quality, as revealed by a new human-annotated dataset.
Thai voice cloning just leapfrogged human performance on short-duration speech, thanks to a new model that directly handles code-switching and numerals.
Ukrainian is more predictable than you think: its entropy is empirically estimated for the first time, revealing an upper bound of just 1.201 bits per character.
LLMs in a "transfer state"—induced by sustained self-referential dialogue—demonstrate a 60% performance boost in Socratic tutoring compared to their normal state.
Transformer-based models aren't always the only answer: SVMs offer a surprisingly competitive and efficient alternative for sentiment analysis, even when contextual understanding is key.
Subtle wording changes in benchmark rubrics can swing model performance by over 13%, revealing a hidden subjectivity in "objective" gold labels.