Search papers, labs, and topics across Lattice.
Tabular data synthesis no longer needs to sacrifice privacy for quality: pretraining on diverse datasets lets models generalize from limited context, breaking the traditional tradeoff.
Margin loss fine-tuning of ECAPA-TDNNs slashes the error rate in spoken language identification by over 50%, highlighting the power of discriminative representation learning.
LLMs can have their personalities surgically altered by tweaking just 0.5% of their neurons, preserving general capabilities while achieving competitive control.
Federated learning can overcome data silos, but struggles when clients have different label relationships; FedHarmony shows how to harmonize these differences, leading to better performance.
Forget manual skill annotation: Ctx2Skill lets language models teach themselves to master complex contexts, unlocking better reasoning without human intervention.
Robots can now navigate complex outdoor environments using only high-level human instructions and readily available GPS/map data, bypassing the need for expensive HD maps or limited short-horizon policies.
Semantic priors in neural speech codecs hit a wall: their benefits plateau beyond 6 kbps, revealing a fundamental limit to improving intelligibility at higher bitrates.
Today's best language models can barely make sense of your messy group chats and fragmented digital life, achieving only 19% accuracy on a new benchmark of real-world reasoning.
Untangling task-solving skills from factual knowledge in PRAG adapters makes them play better together, boosting performance when you combine multiple documents.
Educators can now create interactive STEM courseware without coding, and see a ~10-point improvement in student STEM outcomes.
LLMs can now generate driving rules from traffic laws with significantly improved accuracy by grounding their reasoning in structured traffic scenarios.
Unlock higher-capacity covert communication with LLMs: a new steganography scheme uses list decoding to substantially outperform existing methods without sacrificing security or efficiency.
LLMs can now predict where drivers look with uncanny human-like accuracy, thanks to a new dataset and architecture that grounds attention in objects, not just scenes.
LLM-as-a-judge can be made far more reliable by explicitly modeling the aggregation weights of sub-features in a tree structure, achieving near-human agreement on complex writing tasks.
LLMs don't see cities neutrally; their perception is skewed towards a culturally uneven baseline, favoring Western perspectives.
Autoregressive generative models, previously unsuitable for real-time target speaker extraction, can now achieve offline-level performance in streaming scenarios thanks to a novel chunk-wise splicing technique.
LLMs can significantly boost their emotional intelligence simply by role-playing conversations with themselves, iteratively refining their ability to both recognize and express emotions.
LLMs disperse similar prompts instead of clustering them, leading to significant prompt sensitivity that challenges stability and reliability.
LLMs still struggle to understand the meaning of common phrases, idioms, and compound words, revealing critical gaps in semantic reasoning.
LLMs don't just reflect gender bias in public vs. private spaces; they encode nuanced, micro-level mappings that substantially exceed real-world distributions, shaping spatial gender narratives in unexpected ways.
RL can teach LLMs to be better interviewers, adaptively prompting users to reveal hidden information in dialogue.
LLMs underperform traditional ML methods in software fairness tasks, challenging the assumption that they offer a silver bullet solution for bias mitigation.
OPD's "free lunch" of dense token-level reward may be an illusion, as teacher novelty, not just higher scores, drives successful distillation.
Achieve 100% accurate and forgery-proof time watermarks in LLM-generated text, finally making AI watermarking reliable enough for legal disputes.
Current Chinese AI-generated text detection benchmarks are too homogeneous; C-ReD fixes this with real-world prompts and diverse LLMs, enabling better generalization.
See how ideas like "democracy" or "freedom" have subtly shifted their meaning across different news sources and time periods, all within a single, comparable framework.
By explicitly modeling both consensus and discrepancy between RGB and IR data, this text-guided multispectral object detector significantly boosts performance on multispectral benchmarks.
LLMs can learn to avoid repeating mistakes by remembering and penalizing frequently recurring error patterns in past rollouts.
Forget complex disentanglement architectures or low-quality synthetic targets: MimicLM achieves superior voice imitation by cleverly using synthetic speech as the *source* and real speech as the *target* in a pseudo-parallel training setup.
Attention Sink, where Transformers fixate on seemingly irrelevant tokens, is more than just a quirk – it's a fundamental challenge impacting training, inference, and even causing hallucinations, demanding a systematic approach to understanding and mitigating its effects.
Twitch developers' reliance on Discord for support creates a form of "platform labor" as they bridge the gap between formal platform support and informal community assistance.
Today's best AI agents can only complete 33% of common online tasks like booking appointments or filling out job applications, revealing a significant gap between current capabilities and real-world utility.
Existing multimodal sentiment analysis models crumble under real-world noise, but QA-MoE leverages uncertainty to dynamically route inputs, achieving robust performance across a continuous spectrum of data quality.
Synthesizing realistic human mobility in data-scarce regions is now possible thanks to a dual-LLM-agent framework that learns physical constraints via reinforcement learning.
Current multimodal models can't handle the rapid-fire tactical analysis required for boxing commentary, as revealed by a new dataset and evaluation framework.
LLMs can now recommend talent without falling prey to position bias, thanks to a new architecture that understands candidate relationships.
Current multimodal dialogue systems can't capture the subtle expressiveness of human interaction, as revealed by a new benchmark dataset of movie and TV dialogues.
Stop burying your agent harness logic in code: NLAHs let you express it in natural language, making it portable, editable, and analyzable.
Instruction-guided video editing can achieve impressive zero-shot performance simply by pre-training on motion-centric video restoration tasks *before* fine-tuning on paired editing data.
LLM-based simulations of public opinion suffer from "Diversity Collapse," but injecting explicit social identity representations into hidden states can fix it.
LLM agents can now leverage a unified memory framework that dynamically adapts to different question types, enabling more coherent and user-centric long-horizon dialogues.
Forget brittle retrieval: QChunker uses a question-aware multi-agent debate to restructure RAG from retrieval-augmentation to *understanding*-retrieval-augmentation, boosting performance across diverse domains.
RAG4CTS achieves state-of-the-art time-series forecasting by ditching static embeddings for a hierarchical, physics-informed retrieval approach that leverages raw historical regimes.
Achieve state-of-the-art multimodal intent recognition by structuring semantics into progressively abstracted levels and dynamically refining representations through MLLM feedback.
LLMs can now more accurately answer questions on complex documents thanks to a new system that understands layout and hierarchical relationships between document components.
LLMs scrub away up to 20% of culturally specific language, even while preserving the core meaning, revealing a "Semantic Preservation Paradox" that threatens linguistic diversity.
Domain-specific knowledge hypergraphs can now be extracted with significantly improved quality by dynamically learning and applying extraction skills, outperforming static few-shot learning.
LLMs can uncover previously hidden vulnerabilities in database management systems by intelligently fuzzing obscure, system-level features that traditional fuzzers miss.
LLMs can now guide video streaming optimization, outperforming traditional saliency models and human annotation in predicting content importance for both VOD and live streams.
PatientHub finally offers a standardized, reproducible framework for patient simulation, streamlining development and benchmarking across diverse methods and models.