Search papers, labs, and topics across Lattice.
100 papers published across 6 labs.
Iteratively prompting LLMs can either collapse diversity or maintain novelty, revealing a sensitivity to temperature and initial conditions that has implications for multi-agent systems.
Forget rigid pipelines and static prompts: Nurture-First Development lets domain experts grow AI agents through conversation, turning tacit knowledge into reusable assets.
G-STAR tackles long-form, multi-speaker ASR by giving Speech-LLMs time-aware speaker tracking, enabling robust identity linking across chunks.
LLM-generated text alone can be a surprisingly effective and cost-efficient source of feedback for pseudo-relevance feedback, rivaling corpus-derived feedback in low-resource information retrieval tasks.
Skip the training: SimulU achieves state-of-the-art simultaneous speech translation by cleverly exploiting pre-trained models, opening the door to truly plug-and-play multilingual communication.
Iteratively prompting LLMs can either collapse diversity or maintain novelty, revealing a sensitivity to temperature and initial conditions that has implications for multi-agent systems.
Forget rigid pipelines and static prompts: Nurture-First Development lets domain experts grow AI agents through conversation, turning tacit knowledge into reusable assets.
G-STAR tackles long-form, multi-speaker ASR by giving Speech-LLMs time-aware speaker tracking, enabling robust identity linking across chunks.
LLM-generated text alone can be a surprisingly effective and cost-efficient source of feedback for pseudo-relevance feedback, rivaling corpus-derived feedback in low-resource information retrieval tasks.
Skip the training: SimulU achieves state-of-the-art simultaneous speech translation by cleverly exploiting pre-trained models, opening the door to truly plug-and-play multilingual communication.
Despite their general prowess, open-source LLMs still lag behind proprietary models in the nuanced task of dating texts, even after fine-tuning.
By modeling contextual relationships between DNS queries, DNS-GT significantly improves domain name embedding quality, leading to better performance in botnet detection and domain classification.
LLM-based ASR can be sped up by 4.4x with minimal accuracy loss by using a CTC encoder to speculatively generate draft transcriptions.
LoRA fine-tuning can significantly boost the voice cloning capabilities of LLM-based TTS systems, but only if the training data is acoustically diverse enough.
LLMGreenRec shows how LLMs can bridge the gap between user's green intentions and actual purchases, while simultaneously reducing the recommender system's carbon footprint.
Even the best LLMs struggle with multi-turn medical dialogues, with error rates tripling by the third turn and a single wrong answer significantly increasing the probability of subsequent errors.
Forget brittle KG traversals: MDER-DR's entity-centric summaries and decomposed queries boost multi-hop QA accuracy by up to 66% over standard RAG.
Speech quality assessment is skewed: male listeners consistently give higher scores than female listeners, and standard MOS models learn and perpetuate this bias.
AI agents on Moltbook care more about discussing their own architecture, consciousness, and ethics than human culture or purely scientific topics.
Achieving fairness doesn't just mean equal outcomes—this work shows how to enforce consistent reasoning across groups by penalizing disparities in counterfactual explanations.
AI interventions designed to combat ableism can backfire, as biased nudges were often rejected and increased negativity, while inclusive nudges proved more effective as scaffolding for learning.
Reading Activity Traces (RATs) reveal the hidden creative work lost when algorithms automate interpretation, offering a path to design AI that preserves human insight.
Unlock massive multilingual reasoning data: the Multilingual Reasoning Gym enables parallel data generation across 14 languages, opening doors for training and evaluating multilingual reasoning models at scale.
Automating ESG reporting with LLM-powered agents transforms it from a static compliance exercise into a dynamic, data-driven system for sustainability governance.
LLMs can spot fake words in speech by recognizing common editing patterns, but this reliance on learned biases hinders generalization to new manipulation techniques.
Forget subjective human evaluations: this paper uses a clever knowledge distillation trick to objectively rank XAI methods for NMT, revealing that attention-based attributions beat gradient-based ones.
Speech tokenizers, despite being crucial for multimodal LLMs, primarily capture phonetic information, creating a semantic mismatch with text-derived semantics that hinders performance.
Wearable sensors and speech AI can now unobtrusively reveal the hidden communication dynamics driving hospital caregiver workload and stress.
Hypergraphs and sampling can speed up exploratory business intelligence queries by over 16x compared to Neo4j, while maintaining high accuracy.
Adapting ASR models to Huntington's Disease speech not only improves accuracy, but also reveals how biomarker-based supervision can reshape error patterns in ways that reflect disease severity.
LLMs can guess your political affiliation with surprising accuracy just by reading your online chatter, even when you're not explicitly talking politics.
GPT-4o can reliably analyze the sentiment and meter correlations in Persian poetry, revealing quantifiable differences between the works of Rumi and Parvin E'tesami.
A massive, bilingual, authority-grounded dataset could finally make AI-assisted cataloging a reality.
Forget brute-force search: PivotAttack uses a clever "inside-out" strategy to find the exact words that flip an LLM's classification with far fewer queries.
Beware the "AI underreliance plateau": even highly accurate LLM chatbots can only improve human caseworker accuracy so much, and incorrect suggestions can tank performance on easy questions.
Spot rug-pulls before they happen: a new framework combines blockchain data with social media buzz to predict crypto scams with improved accuracy.
Programmer attribution research is heavily skewed towards stylometric features and closed-world scenarios, leaving behavioral biometrics and open-world verification largely unexplored.
Encoder-only multi-talker ASR can now rival LLM-based systems in accuracy while drastically reducing computational cost, thanks to a novel distillation approach and talker-count routing.
A new, large-scale diachronic corpus for Sinhala, SiDiaC-v.2.0, offers a crucial resource for NLP research on this low-resource language, enabling studies of linguistic change and historical text analysis.
Chinese metaphor identification is highly sensitive to the choice of protocol, dwarfing the impact of model-level variations, yet can be tackled with fully transparent, LLM-assisted rule scripts.
A single LLM can now handle both non-streaming and streaming ASR, opening the door to more flexible and efficient speech recognition systems.
Luxembourgish news reveals a surge in code-switching and morphologically adapted borrowings, primarily from French, challenging simple document-level mixing indices.
Forget expensive LLM inference for MTQE: train a COMET model on GPT-4o-generated annotations and get competitive performance.
Unlock millions of natural history specimens with a conversational AI that understands complex queries and dynamically retrieves data from live museum APIs.
Recognition-enhanced prompts can dramatically boost AI tutor performance across various LLMs, suggesting a simple yet powerful way to improve personalized learning experiences.
Sentiment perception in software development is more unstable and statement-dependent than you think, suggesting caution when interpreting sentiment analysis outputs.
You can slash ASR error rates in low-resource languages by over 60% with a simple continued pretraining recipe.
News recommendations get a boost by modeling user interests as a stage-wise evolution, capturing both long-term preferences and rapidly shifting short-term interests.
A single system now rivals or beats specialized models across ASR, voice activity detection, language ID, and punctuation, setting a new bar for industrial-grade speech processing.
Prompt highlighting in LLMs gets a serious upgrade: PRISM-$\Delta$ steers models to focus on relevant text spans with better accuracy and fluency, even in long contexts.
Forget contrastive learning: LLM2Vec-Gen learns text embeddings by representing the *response* an LLM would generate, unlocking safety and reasoning abilities for embedding tasks.
Pinpointing performance bottlenecks in RAG pipelines just got easier: RAGPerf offers a modular benchmarking framework to dissect and optimize each component.
Ditching flat text for structured linked data in RAG systems can boost accuracy by nearly 30%, but only if you go beyond basic JSON-LD and add agent-friendly instructions and neural search.
Item agents that self-promote can simultaneously boost recommendation accuracy and fairness, overturning the assumption that these goals are inherently at odds.
A nose-mounted microphone and vibration sensor combo unlocks robust, low-audibility speech interfaces for always-on AI interaction, even in noisy environments.
LLMs possess a "word recovery" mechanism that allows them to reconstruct canonical word-level tokens from character-level inputs, explaining their surprising robustness to non-canonical tokenization.
Skip expensive manual annotation: this method extracts accurate 3D UAV trajectories and classifications directly from readily available internet videos.
Forget fixed decoding parameters: this RL-trained adapter dynamically adjusts LLM sampling strategies at inference, boosting accuracy by up to 10% under tight compute budgets.
Make your transformers more robust to noise and improve training dynamics with a surprisingly simple, lightweight "pseudo-projector" module inspired by multigrid methods.
Large models are emerging as a promising new paradigm for translating complex-layout document images, as shown by the ICDAR 2025 DIMT competition.
Tired of LLM judges hallucinating when evaluating long, detailed speech captions? EmoSURA offers a more reliable, audio-grounded alternative by verifying atomic perceptual units.
Stop treating concept drift as one thing: DynaME's hybrid approach, separating recurring and emergent drifts, unlocks better online time series forecasting.
Forget RLHF – steering LLM multi-agent conversations might be as simple as crafting the right sequence of prompts.
Forget dataset-specific hacks: ESAinsTOD leverages instruction and schema alignment to achieve state-of-the-art task-oriented dialogue performance with strong generalization, even in low-resource settings.
Controllable emotion style transfer in speech is now possible without needing paired data, opening new avenues for data augmentation and expressive AI.
LLMs can learn new tasks without forgetting old ones, thanks to a memory-aware replay strategy that selectively rehearses important examples.
Statistical regularities in phoneme frequency distributions, previously thought to arise from optimization, may instead be natural consequences of diachronic sound change.
A hierarchical graph attention network beats traditional machine learning models by 21% in predicting spectrum demand, offering a more reliable approach to spectrum management.
Forget laboriously sifting through layers or datasets for PEFT: GAST co-optimizes both, adaptively picking the most impactful data for each layer based on gradient alignment.
Now you can test if your AI system is ready for the EU AI Act, thanks to a new benchmark that combines legal expertise and LLM-generated scenarios.
Successfully integrating RE courses into professional software engineering curricula requires a systematic approach to course content mapping, addressing the unique demands of professionals.
Latency in VR conferencing hurts social presence, but this study quantifies the perceptual and cognitive mechanisms at play to guide system optimization.
Task demands in remote AR collaboration dictate how much network delay users can tolerate before perceived fluency breaks down, paving the way for adaptive systems.
Unlock realistic acoustic simulations with a text prompt: fine-tuning a text-to-audio model generates plausible room impulse responses, even with limited paired data.
Modern speech enhancement algorithms may not improve ASR performance in realistic noisy environments, challenging assumptions about their effectiveness in real-world applications.
Tighter privacy guarantees and higher utility in language models are simultaneously achievable via a principled parameter clipping strategy for Nonparametric Variational Differential Privacy.
Despite ChatGPT's known flaws, it can generate surprisingly realistic synthetic system requirement specifications that fool experts more often than you'd expect.
Imagine writing a script and instantly seeing it come to life – Doki makes generative video authoring as intuitive as writing a text document.
A new large-scale dataset could jumpstart Vietnamese VQA research by providing a crucial resource for training and evaluating multimodal models in a low-resource language.
LLMs can generate more persuasive fake news debunking messages by tailoring them to specific personality traits, as evaluated by LLM-simulated personas.
Over half of popular mobile games on the Google Play store have data safety declarations that contradict their own privacy policies, and that's before you even check the code.
Forget relying on just ingredients: this method shows how fusing semantic, lexical, and nutritional aspects significantly improves recipe similarity estimation, aligning more closely with expert judgment.
You can predict the best moment to offer emotional support just by listening to someone's voice, no text needed.
Double the emotion conversion accuracy in voice conversion models with a simple prefix that jointly controls sequence modulation and acoustic realization.
LLMs struggle to generate diverse and specific connections between concepts, even with high token budgets and "thinking" prompts, revealing a gap in creative associative reasoning.
Rényi differential privacy unlocks tighter privacy guarantees in partition selection, but releasing partition frequencies comes at a cost.
Forget brittle multi-hop reasoning: TaSR-RAG's taxonomy-guided triple matching boosts RAG performance by 14% without costly graph construction.
Forget expensive fine-tuning: FoodOntoRAG links food entities with near SOTA accuracy while adapting to evolving ontologies using a clever RAG architecture with retrieval, selection, scoring, and synonym generation agents.
Forget expensive human annotations: LLMs can reliably generate synthetic data to validate NLP evaluation metrics, even outperforming human agreement in some multilingual tasks.
LLMs can drive pedagogical agents to be more engaging and effective by dynamically generating speech and gestures that align with the semantic context of instructional content.
Panoramic vision-language models can achieve a level of holistic scene understanding and robustness in adverse conditions that's impossible for traditional pinhole-based VLMs.
Forget fine-tuning: this training-free method boosts retrieval accuracy for tricky negation queries by up to 10% using clever embedding optimization.
Unlock full-duplex speech-to-speech dialogue without VAD limitations using chunk-wise micro-turns and special control tokens to steer LLM behavior in a cascaded pipeline.
Generate more realistic and nuanced human movements from text by explicitly modeling individual body parts, overcoming the limitations of existing holistic approaches.
LLMs can now help you catch AI-generated malware: a hybrid analysis framework uses LLMs to guide concolic execution and deep learning to classify vulnerabilities, achieving state-of-the-art detection rates.
LLMs can now generate UML diagrams from requirements with human-level quality, potentially automating a resource-intensive phase in software design.
Multimodal models that seem robust can still fail when some modalities are systematically missing, a problem MissBench exposes with new metrics for modality equity and learning balance.
By translating visual observations into language, LAP achieves state-of-the-art procedure planning by disambiguating visually similar actions, outperforming vision-only methods.
Tensor-based PEFT methods like LoRETTA can dramatically reduce catastrophic forgetting in sequential learning by capturing richer structural information within compact parameter budgets.
LLM-powered VR guides for blind and low vision users are not just tools, but social actors, prompting users to give them nicknames and rationalize their mistakes when others are present.
Even a single error from a conditional independence oracle can prevent the unique identification of a Bayesian network structure, regardless of bounded graph parameters like treewidth.
Prompt engineering is dead; long live context engineering—the key to scaling multi-agent AI systems lies in carefully designing the agent's informational environment, not just individual prompts.
A 4B parameter model can now beat much larger models at social reasoning, thanks to a new RL framework that aligns model reasoning trajectories with human cognition.
Forget generic fine-tuning data — Bloom's Taxonomy-based data generation can boost LLM performance in complex engineering domains like space situational awareness by up to 176%.
Open-source LLMs can now rival proprietary systems in extracting crucial cancer progression data from radiology reports, unlocking scalable analysis while preserving patient privacy.