Search papers, labs, and topics across Lattice.
Text understanding, generation, summarization, translation, information extraction, and linguistic analysis.
#2 of 24
0
LLM-derived abstractions significantly boost analogical reasoning in narratives, outperforming end-to-end LLMs and revealing the critical role of appropriate abstraction levels.
Physiological synchrony in medical teams doesn't always signal success; it's the *context* of shared discovery versus shared uncertainty that determines whether it predicts effective collaboration.
Even Gemini can understand you if you speak its language: structured intent prompting slashes cross-language performance variance and boosts weaker models more than stronger ones.
Forget complex LLMs: a small, fine-tuned transformer surprisingly nails readability scoring for German ESG reports.
Automated medical coding finally gets explainable: Symphony's agentic approach provides span-level evidence, linking each predicted code to the supporting text.
Representing probability distributions with first-order logic formulas can drastically reduce their size, offering a path to more efficient probabilistic reasoning.
Stop guessing which layers to edit in your LLM – KEditVis reveals the inner workings of knowledge editing, letting you pinpoint the most effective interventions.
LLMs don't just make people confidently wrong; they create a dangerous illusion of competence by decoupling performance from actual understanding.
LLMs can steer narrative extraction to align with user-specified perspectives, achieving a 9.9% improvement in agenda alignment over keyword matching without sacrificing narrative coherence.
Interactive narrative maps with semantic interaction significantly boost insight generation compared to static maps and timelines, offering a more intuitive path to model refinement.
Human brains and neural networks may converge on similar "Platonic" representations for linguistic constructions, suggesting universal principles guide efficient language abstraction.
Bilingual language models can achieve performance comparable to monolingual models in both languages, challenging the assumption that bilingual input poses significant learning obstacles.
Training language models on individual children's language reveals that distributional and interactional linguistic features, not just dataset size, are key to efficient learning, mirroring factors that drive child language acquisition.
Enriching meaning representations with task demonstrators can significantly boost dialogue generation, especially in challenging scenarios, revealing a simple yet effective strategy for improving NLG performance.
Multilingual vision-language models can achieve surprisingly strong performance (36% on MMMU) simply by training on translated data and aligning with parallel text corpora.
Forget fine-tuning: this HTR model adapts to new handwriting styles in just a few shots, *without* any parameter updates.
News agencies reuse content across languages far more than simple lexical overlap reveals, with over half of articles drawing on foreign sources through paraphrase and compositional techniques.
LLMs can nail the clinical content of prior authorization letters, but consistently fumble the administrative details that actually get them approved.
AI benchmarks may be giving you a false sense of comprehensive evaluation: the six scores on the Open LLM Leaderboard effectively boil down to just two independent measurements.
Forget prompt engineering – Nomad autonomously uncovers insights you didn't even know to ask for.
LLMs used in matchmaking amplify existing caste hierarchies, rating same-caste matches significantly higher and perpetuating social biases in potentially harmful ways.
Accurately predict how customers will react to price changes, even without controlled experiments, using a new Monodense neural network that beats traditional methods.
NeuralUCB can slash LLM inference costs while maintaining quality, offering a practical alternative to always using the biggest, most expensive models.
Throw out your full images: focusing on pathology-relevant visual patches in radiology reports dramatically outperforms using the entire image for summarization.
Northern Kurdish finally gets its due with FLEURS-Kobani, a new benchmark dataset that exposes the challenges and opportunities for ASR and speech translation in this under-resourced language.
Global speech slowing, a common strategy for improving intelligibility, is outperformed by targeted, data-driven speech rate adjustments that listeners don't even consciously notice.
Knowing the context around a claim—gleaned from Wikipedia—can boost verifiable claim detection, but the benefit depends heavily on the domain and model used.
Training NERL models on modern Italian won't cut it for historical texts: ENEIDE exposes the performance gap with a new multi-domain dataset spanning two centuries.
Forget expensive finetuning: DUME dynamically combines existing expert LLMs into a powerful MoE *without* additional training, unlocking multi-domain performance at minimal cost.
Forget SEO, optimizing content *structure* alone boosts citation rates in generative AI search engines by 17%.
You can shrink a privacy expert LLM by 4500x and still get human-level privacy judgments.
LLM-generated authorial impersonations, despite their sophistication, are surprisingly detectable by existing authorship verification methods, even outperforming on some genuine negative samples.
Forget fancy ensembling – simply asking an LLM how confident it is in its grading is the most reliable way to predict its accuracy, and it's far cheaper than self-consistency voting.
LLMs can classify dialects with surprising accuracy when given linguistic hints, suggesting a new way to leverage their knowledge for low-resource language tasks.
LLMs may ace English, but LLM Probe reveals surprising performance disparities in low-resource languages, with sequence-to-sequence models unexpectedly leading in morphosyntax.
Radiology report generation models can now verbalize calibrated confidence estimates, enabling targeted radiologist review of potentially hallucinated findings.
Mental-health support chatbots get a much-needed reality check with CounselReflect, a toolkit that exposes their strengths and weaknesses through transparent, multi-dimensional audits.
Forget finetuning or embeddings: better topic models are lurking in your corpus's own co-occurrence stats.
LLMs ace linguistic benchmarks, but a token-level perplexity analysis reveals they're often relying on the wrong cues.
Adapting Labovian narrative analysis to Japanese reveals the challenges and opportunities in cross-linguistic qualitative research, highlighting the need for language-specific guidelines.
LLMs struggle to handle common, challenging patient behaviors like contradictory statements and inaccurate medical information, revealing critical safety gaps in medical consultation applications.
Unlock knowledge equity for underserved languages: L-ReLF offers a reproducible recipe for creating high-quality lexical datasets where they're needed most.
Despite its simple grammar, Esperanto translation still poses challenges for LLMs, with NLLB models only preferred in about half of human evaluations.
Japanese entity linking gets a boost: CADEL offers a high-quality, Japan-specific corpus to tackle the unique challenges of linking entities in administrative web documents.
LLMs can achieve state-of-the-art multilingual speech recognition by smartly handling noisy phoneme inputs, even with severe data imbalance across languages.
Forget slow, bloated LLMs – this work shows you can get GPT-4o quality on long-document QA with a 3B model and a clever structure-first distillation approach.
Proprietary language models trounce open-source alternatives by 3-6x on a new, large-scale corpus of Sinhala and Pali Buddhist texts.
The first publicly available dataset for Syrian Arabic Sign Language (SyArSL) opens the door for machine translation research to improve accessibility for a historically underserved community.
GPT-4 can automatically generate FSMs from textual requirements, but expert-guided mutation and testing are crucial for repairing imperfections.
A human-in-the-loop AI assistant can provide scalable, high-quality coding education support in resource-constrained African contexts, even with limited infrastructure.