Search papers, labs, and topics across Lattice.
100 papers published across 7 labs.
Transformers can be explicitly designed to perform nonlinear regression in-context by leveraging attention as a featurizer, offering a theoretical understanding of how these models learn complex relationships from prompts.
Synthetic data augmentation and per-language threshold tuning can significantly boost the performance of LLMs on multilingual tasks, outperforming alternative architectures that showed promise on the development set.
AI co-mentorship lets high schoolers build real-world financial models, skipping the classroom grind and diving straight into problem-solving.
Hallucination detection can be reframed as a dynamical systems problem, enabling a surprisingly effective and efficient black-box approach that avoids expensive sampling or external knowledge retrieval.
Anomaly detection in EHR data can pinpoint potentially erroneous clinical decisions with surprisingly low false alarm rates, suggesting a practical pathway to improve patient safety.
Transformers can be explicitly designed to perform nonlinear regression in-context by leveraging attention as a featurizer, offering a theoretical understanding of how these models learn complex relationships from prompts.
Synthetic data augmentation and per-language threshold tuning can significantly boost the performance of LLMs on multilingual tasks, outperforming alternative architectures that showed promise on the development set.
AI co-mentorship lets high schoolers build real-world financial models, skipping the classroom grind and diving straight into problem-solving.
Hallucination detection can be reframed as a dynamical systems problem, enabling a surprisingly effective and efficient black-box approach that avoids expensive sampling or external knowledge retrieval.
Anomaly detection in EHR data can pinpoint potentially erroneous clinical decisions with surprisingly low false alarm rates, suggesting a practical pathway to improve patient safety.
GMD algorithms, previously seen as a novel generative framework, can be understood as directly targeting fixed points of Wasserstein Gradient Flows, offering a new perspective on their optimization process.
Modeling 10,000+ correlated outputs is now tractable: T-LVMOGP offers a scalable alternative to restrictive low-rank MOGPs by learning a flexible deep kernel in a shared embedding space.
LLMs can now impute missing healthcare data well enough to improve causal treatment effect estimation from real-world EHRs, even with 80% missingness.
Forget rigid memory structures: Memini lets your LLM's external knowledge evolve organically, learning and forgetting like a brain.
Discovering spatial regions and their temporal signatures in massive time series data just got much faster and easier, thanks to a new method that scales log-linearly with the number of time series.
Training data order matters more than you think: reordering your data can significantly improve unsupervised domain adaptation by reducing variance in domain discrepancy estimates.
Forget fine-tuning: "skill neologisms"—new soft tokens—let you inject skills into LLMs without weight updates, composing them zero-shot for flexible knowledge expansion.
Steering LLMs with conceptors—soft projection matrices capturing the full semantic subspace—yields more robust control and enables Boolean logic for composing concepts, moving beyond the limitations of single-vector steering.
Conformal prediction for graph time series doesn't have to break down: by conditioning on low-frequency trends, you can restore exchangeability and get valid uncertainty estimates.
Forget retraining: this model learns interpretable logical rules from data in a zero-shot manner by encoding literals with domain-agnostic statistical properties.
Tabular data synthesis no longer needs to sacrifice privacy for quality: pretraining on diverse datasets lets models generalize from limited context, breaking the traditional tradeoff.
Symmetric spectral analysis of attention is fundamentally blind to information flow direction, but a simple asymmetry coefficient can restore the signal.
Standard multimodal fusion can hurt performance in emotion recognition, but this new approach knows when to drop modalities, leading to state-of-the-art results.
GNN uncertainty just got a whole lot easier: QpiGNN delivers better coverage and tighter intervals without the quantile gymnastics.
Overcome limitations in capturing complex user-service dependencies with a novel tensor decomposition method that significantly boosts QoS prediction accuracy.
LLMs can construct interpretable, multi-layered models of individual student cognition from journal entries, opening new possibilities for personalized education.
Forget opaque transformers: Gyan offers SOTA language modeling with full interpretability, lower compute, and human-like compositional understanding.
Incentivizing honest participation in federated learning is now possible without ground truth labels, even when some participants are trying to game the system.
Carbery's conjectured improvement to the triangle inequality in Lp spaces is false for p > 2, but a weaker version holds true with a sharp exponent.
Hallucination detection can be nearly as effective with a single forward pass as with expensive multi-sample methods.
Forget relying on LLMs to judge themselves: this "Concept Field" approach uses vector math on text corpora to detect hallucinations and novelty, offering a fast, interpretable, and black-box alternative.
Think-Aloud data doesn't just improve cognitive model fit; it fundamentally reshapes the discovered model structure, revealing cognitive mechanisms undetectable from behavior alone.
Interventions on LLMs, like knowledge editing or unlearning, can have surprising side effects that this automated pipeline can now surface and validate.
Shuffling activations, a popular defense in secure Transformer inference, crumbles under a new alignment attack that recovers model weights for just $1.
Forget expert intuition – language trends in patent filings can foresee technological breakthroughs years before they happen.
L2 learners' struggles with idioms, captured in a new eye-tracking dataset, offer a cognitively-grounded benchmark for evaluating how well LLMs truly "understand" figurative language.
Current reward models are surprisingly bad at judging story quality, achieving only 66% accuracy in selecting human-preferred narratives – a gap closed by a new, purpose-built reward model.
Teachers can now scalably provide high-quality, personalized feedback to students by leveraging a multi-LLM system that synthesizes rubric data and qualitative observations, while retaining control through a teacher-in-the-loop workflow.
Forget stilted, unconvincing VR characters: EBM-RL's novel reward decomposition finally makes video-grounded role-playing dialogue feel immersive.
Automating rubric-based feedback on presentation slides is now feasible and perceived as useful, thanks to LLMs and learning analytics dashboards.
Identity-preserving video generation just got a whole lot more faithful: FaithfulFaces maintains identity even under extreme pose variations and occlusions, a feat previous methods struggled with.
LLM uncertainty can be efficiently estimated *without* sampling by measuring the stability of output distributions under semantically equivalent input perturbations.
AI-powered learning systems often fail adult learners because they're built for kids: here are 19 guidelines to fix that.
Unlock scalable, high-quality singing voice synthesis by directly generating structured musical scores from audio, outperforming existing systems on multiple datasets.
HeterSEED achieves state-of-the-art performance on heterophilic heterogeneous graphs by decoupling semantic and structural information, offering a more robust approach than relying on feature similarity alone.
Ditching diffusion's noise-denoising, RLFSeg uses Rectified Flow to directly predict segmentation masks from text prompts, unlocking zero-shot performance gains.
LLMs can get up to 6x more logically consistent without human feedback, simply by fusing NLI scores into the DPO training loop.
A judge-orchestrated ensemble of diverse LLMs trounces single models in multi-turn response generation, proving that strategic model selection beats brute force scaling.
LMs encode grammaticality as a distinct feature in their hidden representations, separable from raw string probability and generalizable across languages.
LLMs ace MRI multiple-choice tests, but can't actually recall basic facts about GE scanners, revealing a dangerous gap between perceived and actual competence.
Overconfident predictions plague mental health prediction models, but this new framework leverages evidential learning to provide more trustworthy uncertainty estimates and human-understandable reasoning signals.
LLMs differ most not in personality, but in how they represent themselves as having (or not having) rich internal experience.
Attention heads hold the key to detecting LLM hallucinations, offering a lightweight, white-box alternative to expensive sampling or external models.
TabEmbed leapfrogs existing text embedding models to achieve SOTA performance on tabular data by reformulating tasks as semantic matching problems and using contrastive learning.
Forget full fine-tuning: QLoRA on 7B models can match the perplexity of fully fine-tuned smaller models for low-resource languages, while slashing the parameter count by 40x.
Small LLMs paired with symbolic solvers can outperform larger zero-shot LLMs on formal reasoning tasks, but still struggle with multilingual inputs.
Patents overselling their innovation actually face a *penalty* in evaluation, decreasing their chances of being granted, transferred, or successfully appealed.
Sometimes, simpler is better: Logistic Regression beats BiLSTMs at tweet sentiment classification on medium-sized datasets.
LLM benchmarks are missing a critical ingredient: social science data, which could significantly improve model generalization and robustness across a wide range of disciplines.
E-commerce sentiment analysis is surprisingly influenced by socio-political terminology, impacting the accuracy of customer satisfaction prediction models.
CNN-BiLSTM beats AutoML for Indonesian hate speech detection, but the gains are modest, suggesting the dataset's limitations are a bigger bottleneck than model architecture.
Ditch the black box: This unsupervised semantic projection method rivals supervised models in psychological assessment, offering interpretability and generalizability that supervised methods lack.
State-of-the-art temporal knowledge graph reasoning is now possible by jointly modeling historical evidence and evolutionary dynamics, unlocking complementary predictive signals.
LLM surrogates in low-data optimization are far more sensitive to prompt engineering and query protocols than previously appreciated, fundamentally altering their beliefs and downstream performance.
LLMs can be surprisingly brittle: simply rephrasing a prompt, even while preserving its meaning, can cause them to completely abandon the requested output format.
Dissimilarity, not just similarity, unlocks better language generalization for low-resource varieties.
Political ideology prediction gets a boost: injecting LLMs with knowledge graphs of MP relationships significantly improves accuracy.
Unlock Tajik NLP: a new open-source toolkit delivers a comprehensive pipeline for processing Cyrillic-script Tajik text, complete with datasets and pre-trained embeddings.
International media attention to Brazilian disasters doesn't always reflect the actual severity or frequency of events, revealing a disconnect between disaster databases and news cycles.
Even state-of-the-art multilingual models struggle to tag parts-of-speech in Tajik when trained on isolated words, highlighting the critical role of syntactic context.
UniVer achieves state-of-the-art speculative decoding by jointly optimizing multi-step and multi-draft verification, outperforming existing methods by up to 8.5% in acceptance length.
You can distill interpretable Bayesian reasoning about opponent preferences into an 8B language model, outperforming much larger models and enabling detailed auditability of negotiation strategies.
RAG systems can be significantly improved by reranking documents based on how much they increase the LLM's confidence, not just their relevance.
Stop hand-crafting QA datasets for evaluating RAG systems: DoGMaTiQ automates the process with surprisingly high correlation to human judgment, even across languages.
LLMs can retain 10x more of their original capabilities after fine-tuning, simply by using a dynamically adjusted "anchor" to constrain distributional drift during training.
LLMs get schooled in dialogue state tracking by a mixture-of-experts architecture that uses a graph neural network and ReAct agents to achieve state-of-the-art results with a T5-Small backbone.
Forget token deletion – Telegraph English rewrites prompts into a symbol-rich, structured dialect that compresses by 50% while actually *improving* accuracy on smaller models.
Pinpointing minimal "conflict essences" reveals precisely how graph transformation rules interfere, even with complex nested conditions.
GenAI coding assistants boost developer productivity, but the gains shrink outside the lab and don't translate to better learning.
Generative recommendation gets a boost: CapsID's soft-routed semantic IDs outperform hard-quantized baselines and even rival sparse-dense hybrids, all while slashing inference latency by nearly half.
LLMs for recommendation can now surpass the limitations of static training signals, achieving sustained improvements in ranking accuracy, fairness, and diversity through a dynamically updated Bayesian distillation target.
On-device LLMs can now drive real-time recommendation improvements, unlocking faster adaptation to evolving user intent without cloud reliance.
Stop retraining your object detector every time it makes a mistake: EBOD learns from failure examples to prevent recurring errors in open-vocabulary object detection.
LLMs struggle to navigate the complex, multi-turn justification and response dynamics of real-world patent examination, revealing critical gaps in legal reasoning and technical novelty judgment.
LLMs beat doctors at everyday symptom diagnosis, but only when they proactively interview patients instead of passively answering questions.
LLMs struggle with causal reasoning when noise is introduced, but explicitly modeling causal graphs can dramatically improve performance and generalization.
Semantic watermarks, embedded via AMR, survive paraphrasing attacks that obliterate token-level watermarks.
LLMs are surprisingly good at pinpointing what's *wrong* with student writing, even outperforming human graders in identifying relative weaknesses.
Existing hallucination detection methods are missing subtle, word-level medical errors, but a new data-centric pipeline and detector closes the gap by 15%.
Forget massive models: small, locally-deployable language models can achieve surprisingly strong performance on privacy-sensitive clinical information extraction tasks with self-prompting and preference-based optimization.
Forget boring rotary embeddings: Jordan-RoPE unlocks distance-modulated phase interactions in attention, letting your model learn relationships like "the further apart, the stronger the cosine similarity."
Despite impressive multilingual capabilities, today's LLMs still can't reliably translate between English and Ghanaian languages at scale.
Domain match and language relatedness trump joint vocabularies for effective knowledge transfer in multilingual NMT.
LLMs exhibit a surprising "False Illegitimation bias," systematically misclassifying legitimate battles as violence against civilians, highlighting a critical flaw for conflict monitoring applications.
LLMs may sound convincing when writing academic content, but they can still confidently fabricate facts and references at surprisingly high rates.
Forget the heavy transformers: surprisingly effective LLM-generated code detection can be achieved with lightweight stylometric features and decision trees, offering near-instant inference.
LLMs can exhibit gender bias in emergency triage even when well-calibrated, and interventions effective for one model may backfire on another.
LLMs' own self-judgments, when logically linked to their response features, can significantly improve hallucination detection.
Activation steering can finally match the nuanced control of prompt engineering: token-specific interventions learned from prompts let you steer LLMs more effectively.
Naive application of transformer-based AI-text detectors can be brittle under distribution shift, but attention-based fusion of readability and vocabulary features can significantly improve robustness.
Language models can play the counterexample game, but their philosophical reasoning hits diminishing returns fast, and they're far more lenient judges than humans.
Model collapse isn't just a technical problem; it's a threat to AI democratization that will widen the gap between high- and low-resource communities.
Even top LLM judges struggle to reliably detect violations of specific constraints in complex instructions, especially when violations are partial or absent, revealing critical blind spots in current evaluation methods.
Learn to build and evaluate your own NLP pipeline, from tokenization to RLHF, using open-weight models and reproducible research practices.
Instead of creating new AI companions from scratch, Deco shows how to breathe new life into cherished physical objects by giving them a digital voice and personality powered by LLMs.