Search papers, labs, and topics across Lattice.
100 papers published across 3 labs.
LLM-derived abstractions significantly boost analogical reasoning in narratives, outperforming end-to-end LLMs and revealing the critical role of appropriate abstraction levels.
Physiological synchrony in medical teams doesn't always signal success; it's the *context* of shared discovery versus shared uncertainty that determines whether it predicts effective collaboration.
Even Gemini can understand you if you speak its language: structured intent prompting slashes cross-language performance variance and boosts weaker models more than stronger ones.
Forget complex LLMs: a small, fine-tuned transformer surprisingly nails readability scoring for German ESG reports.
Automated medical coding finally gets explainable: Symphony's agentic approach provides span-level evidence, linking each predicted code to the supporting text.
LLM-derived abstractions significantly boost analogical reasoning in narratives, outperforming end-to-end LLMs and revealing the critical role of appropriate abstraction levels.
Physiological synchrony in medical teams doesn't always signal success; it's the *context* of shared discovery versus shared uncertainty that determines whether it predicts effective collaboration.
Even Gemini can understand you if you speak its language: structured intent prompting slashes cross-language performance variance and boosts weaker models more than stronger ones.
Forget complex LLMs: a small, fine-tuned transformer surprisingly nails readability scoring for German ESG reports.
Automated medical coding finally gets explainable: Symphony's agentic approach provides span-level evidence, linking each predicted code to the supporting text.
Representing probability distributions with first-order logic formulas can drastically reduce their size, offering a path to more efficient probabilistic reasoning.
Stop guessing which layers to edit in your LLM – KEditVis reveals the inner workings of knowledge editing, letting you pinpoint the most effective interventions.
LLMs don't just make people confidently wrong; they create a dangerous illusion of competence by decoupling performance from actual understanding.
LLMs can steer narrative extraction to align with user-specified perspectives, achieving a 9.9% improvement in agenda alignment over keyword matching without sacrificing narrative coherence.
Interactive narrative maps with semantic interaction significantly boost insight generation compared to static maps and timelines, offering a more intuitive path to model refinement.
Human brains and neural networks may converge on similar "Platonic" representations for linguistic constructions, suggesting universal principles guide efficient language abstraction.
Bilingual language models can achieve performance comparable to monolingual models in both languages, challenging the assumption that bilingual input poses significant learning obstacles.
Training language models on individual children's language reveals that distributional and interactional linguistic features, not just dataset size, are key to efficient learning, mirroring factors that drive child language acquisition.
Enriching meaning representations with task demonstrators can significantly boost dialogue generation, especially in challenging scenarios, revealing a simple yet effective strategy for improving NLG performance.
Multilingual vision-language models can achieve surprisingly strong performance (36% on MMMU) simply by training on translated data and aligning with parallel text corpora.
Forget fine-tuning: this HTR model adapts to new handwriting styles in just a few shots, *without* any parameter updates.
News agencies reuse content across languages far more than simple lexical overlap reveals, with over half of articles drawing on foreign sources through paraphrase and compositional techniques.
LLMs can nail the clinical content of prior authorization letters, but consistently fumble the administrative details that actually get them approved.
AI benchmarks may be giving you a false sense of comprehensive evaluation: the six scores on the Open LLM Leaderboard effectively boil down to just two independent measurements.
Forget prompt engineering – Nomad autonomously uncovers insights you didn't even know to ask for.
LLMs used in matchmaking amplify existing caste hierarchies, rating same-caste matches significantly higher and perpetuating social biases in potentially harmful ways.
Accurately predict how customers will react to price changes, even without controlled experiments, using a new Monodense neural network that beats traditional methods.
NeuralUCB can slash LLM inference costs while maintaining quality, offering a practical alternative to always using the biggest, most expensive models.
Throw out your full images: focusing on pathology-relevant visual patches in radiology reports dramatically outperforms using the entire image for summarization.
Northern Kurdish finally gets its due with FLEURS-Kobani, a new benchmark dataset that exposes the challenges and opportunities for ASR and speech translation in this under-resourced language.
Global speech slowing, a common strategy for improving intelligibility, is outperformed by targeted, data-driven speech rate adjustments that listeners don't even consciously notice.
Knowing the context around a claim—gleaned from Wikipedia—can boost verifiable claim detection, but the benefit depends heavily on the domain and model used.
Training NERL models on modern Italian won't cut it for historical texts: ENEIDE exposes the performance gap with a new multi-domain dataset spanning two centuries.
Forget expensive finetuning: DUME dynamically combines existing expert LLMs into a powerful MoE *without* additional training, unlocking multi-domain performance at minimal cost.
Forget SEO, optimizing content *structure* alone boosts citation rates in generative AI search engines by 17%.
You can shrink a privacy expert LLM by 4500x and still get human-level privacy judgments.
LLM-generated authorial impersonations, despite their sophistication, are surprisingly detectable by existing authorship verification methods, even outperforming on some genuine negative samples.
Forget fancy ensembling – simply asking an LLM how confident it is in its grading is the most reliable way to predict its accuracy, and it's far cheaper than self-consistency voting.
LLMs can classify dialects with surprising accuracy when given linguistic hints, suggesting a new way to leverage their knowledge for low-resource language tasks.
LLMs may ace English, but LLM Probe reveals surprising performance disparities in low-resource languages, with sequence-to-sequence models unexpectedly leading in morphosyntax.
Radiology report generation models can now verbalize calibrated confidence estimates, enabling targeted radiologist review of potentially hallucinated findings.
Mental-health support chatbots get a much-needed reality check with CounselReflect, a toolkit that exposes their strengths and weaknesses through transparent, multi-dimensional audits.
Forget finetuning or embeddings: better topic models are lurking in your corpus's own co-occurrence stats.
LLMs ace linguistic benchmarks, but a token-level perplexity analysis reveals they're often relying on the wrong cues.
Adapting Labovian narrative analysis to Japanese reveals the challenges and opportunities in cross-linguistic qualitative research, highlighting the need for language-specific guidelines.
LLMs struggle to handle common, challenging patient behaviors like contradictory statements and inaccurate medical information, revealing critical safety gaps in medical consultation applications.
Unlock knowledge equity for underserved languages: L-ReLF offers a reproducible recipe for creating high-quality lexical datasets where they're needed most.
Despite its simple grammar, Esperanto translation still poses challenges for LLMs, with NLLB models only preferred in about half of human evaluations.
Japanese entity linking gets a boost: CADEL offers a high-quality, Japan-specific corpus to tackle the unique challenges of linking entities in administrative web documents.
LLMs can achieve state-of-the-art multilingual speech recognition by smartly handling noisy phoneme inputs, even with severe data imbalance across languages.
Forget slow, bloated LLMs – this work shows you can get GPT-4o quality on long-document QA with a 3B model and a clever structure-first distillation approach.
Proprietary language models trounce open-source alternatives by 3-6x on a new, large-scale corpus of Sinhala and Pali Buddhist texts.
The first publicly available dataset for Syrian Arabic Sign Language (SyArSL) opens the door for machine translation research to improve accessibility for a historically underserved community.
GPT-4 can automatically generate FSMs from textual requirements, but expert-guided mutation and testing are crucial for repairing imperfections.
A human-in-the-loop AI assistant can provide scalable, high-quality coding education support in resource-constrained African contexts, even with limited infrastructure.
LLMs can better capture human semantic similarity by predicting sets of related concepts instead of single next tokens.
LLMs still struggle to accurately infer user interests from interaction histories, especially when dealing with diverse engagement signals – a critical gap for effective personalization.
Smart hospital research is converging towards integrated ecosystems where AI, trust, and infrastructure reinforce each other, but real-world implementation and governance are lagging.
Despite the EU's Digital Services Act aiming to empower Trusted Flaggers in combating harmful online content, TFs are struggling with accreditation hurdles, resource scarcity, and conflicting platform priorities, raising serious questions about the DSA's practical effectiveness.
Instructors and students are often on different planets when it comes to understanding why cheating happens in CS courses.
Simply injecting GenAI into online learning discussions doesn't cut it; reciprocal exchange and human oversight are key to boosting social presence and higher-order cognition.
Bridging TradFi and DeFi asset tokenization requires more than just technology – it demands a standardized regulatory framework, and this paper delivers one.
LLMs can now reproduce Android app bugs with 87% accuracy, thanks to pre-assessing the visual effects of UI actions.
Stop optimizing LLM logs for human readability – runtime-guided, task-oriented logs dramatically improve downstream debugging performance.
Guaranteeing that erasing "erasable" function arguments provably preserves program behavior opens the door to more efficient and verifiable code optimization.
Surgical VQA gets a major upgrade: SurgTEMP's hierarchical visual memory and competency-based training leapfrog existing models in understanding complex, time-sensitive surgical procedures.
By injecting LLM-derived contextual cues into skeleton representations, SkeletonContext achieves state-of-the-art zero-shot action recognition, even distinguishing visually similar actions without explicit object interactions.
Gaze, often overlooked, reveals deepfake origins with surprising accuracy, enabling a new CLIP-based approach that significantly boosts deepfake attribution and detection.
Mitigating bias in deep learning models is now possible without needing sensitive protected attribute information, opening doors for fairer AI in privacy-conscious applications.
Negation, a known weakness in VLMs like CLIP, can be dramatically improved by strategically fine-tuning only the *front* layers of the text encoder with a modified contrastive loss.
XR's widespread use isn't about "Extended Reality" at all, but rather its neutrality as a symbolic container for VR, AR, and MR.
Current multimodal dialogue models struggle to capture the nuanced expressiveness of human interaction, but a new dataset and benchmark reveal exactly where they fall short.
An AI agent can now autonomously design functional antibodies with nanomolar affinities from text prompts, achieving a 67% success rate in lab validation and accelerating expert workflows by 56x.
Ditching mel-spectrograms unlocks surprisingly better text-to-speech, as LongCat-AudioDiT proves that waveform latent diffusion can beat the state-of-the-art in zero-shot voice cloning.
Arabic mispronunciation detection just got a whole lot better: F1-scores jumped by 0.28 thanks to novel architectures and a new dataset of authentic mispronunciations.
Generative recommendation's touted cold-start abilities often vanish under rigorous testing, revealing a sensitivity to design choices that current benchmarks fail to capture.
Generative recommendation models can adapt to evolving user behavior without catastrophic forgetting by selectively updating item tokens based on a novel drift-detection mechanism.
Single-vector embeddings' retrieval failures aren't just about dimensionality; they're fundamentally hobbled by domain shift, relevance misalignment, and a "drowning" effect that multi-vector models handle far better.
Stakeholder-agnostic requirements engineering in aged-care tech can lead to misalignment and missed priorities, as developers, caregivers, and older adults often disagree on what matters most.
Open-source projects are quietly integrating ML models in ways that may violate terms of service and regulations, raising concerns about unchecked ML automation.
Gumbel watermarks just got a whole lot harder to evade: a new detection method is provably near-optimal.
Bounded context windows in next-token prediction models can be fundamentally incompatible with low adversarial regret, even with long context lengths.
Escape the confines of linear literature reviews: this multi-agent system surfaces hidden connections and ruptures in research landscapes, revealing insights that traditional methods miss.
Spectral analysis of graph neighborhoods reveals a surprisingly effective and efficient way to boost anomaly detection, consistently outperforming existing GNN-based methods.
Transformers can now predict with an explicit internal structure of uncertainty, enabling stronger probabilistic evaluation and a more informative analysis of model behavior.
Transformers can now dynamically adapt expert weighting in online learning, achieving state-of-the-art dynamic regret in non-stationary environments.
Unconstrained bandit linear optimization can be surprisingly reduced to standard online linear optimization using a perturbation approach, unlocking new regret guarantees and high-probability bounds.
Unlock hidden predictive power: NLP on unstructured clinical notes beats traditional EHR data for early disease prediction.
Diffusion Maps alone fail to directly recover low-dimensional charts, requiring combination of multiple modes, challenging their common perception as a drop-in dimensionality reduction technique.
Achieve near state-of-the-art OCR accuracy with 95% less compute by decoupling character detection from language correction and training the language model on synthetic noise alone.
LLMs can now construct high-fidelity, disease-specific knowledge graphs from full-text biomedical literature, unlocking evidence-aware reasoning and hypothesis generation.
Semantic disagreement between LLMs reveals crucial uncertainty that single-model metrics miss, and Collaborative Entropy (CoE) captures it.
Data literacy isn't monolithic: K-12 learners navigate wildly different learning pathways depending on the context, challenging assumptions about a one-size-fits-all approach.
Gemini 3 flash can answer introductory programming questions better than typical educators, suggesting a path to scalable, personalized feedback in CS1 courses.
LLMs can better adapt to diverse preferences by explicitly separating stable personal traits from situational factors, leading to significant performance gains, especially when preferences shift across episodes.
Retail AI's promise of intuitive, personalized experiences crumbles when confronted with the reality of differently abled users, exposing a systemic neglect of accessibility in design and deployment.
Open-source document parsing models are shockingly brittle, losing nearly 18% accuracy on real-world photos and 14% on non-Latin scripts compared to their closed-source counterparts.
VLMs can unlock insights from troves of historical documents previously inaccessible due to OCR limitations, achieving state-of-the-art transcription and speaker tagging of Italian parliamentary speeches.
Unlock richer, more realistic agent simulations by moving beyond individual personas to unified group representations that capture collective behavior.
Instead of forcing a single interpretation, this work embraces the inherent ambiguity of natural language to generate multiple plausible STL formulas from a single NL task description.
Even a small, targeted dataset can bridge the gap in cross-dialect transfer learning for low-resource languages, significantly boosting dependency parsing accuracy.
LLMs' struggles with non-standard languages aren't just a technical problem, but reflect and reinforce historical power imbalances embedded in linguistic standardization.
LLMs can now reliably transform messy app store reviews into well-formatted user stories, but still fall short of creating truly independent and unique requirements for agile development.
Atomic decomposition, a popular technique for LLM judges, may not be superior to holistic evaluation when prompts are carefully controlled, challenging the assumption that breaking down answers into claims is always beneficial.
You can now unmask LLM ghostwriters with a lightweight fingerprinting method that works even when they try to hide in new domains or use unseen models.