Search papers, labs, and topics across Lattice.
88 papers published across 5 labs.
Neurosymbolic grounding of LLMs in telemetry and knowledge graphs slashes expert-rated overclaims in industrial maintenance explanations by 93%, making AI assistants far more trustworthy in safety-critical settings.
LLMs can be made 20% more accurate by jointly attributing claims to sources and verifying them, rather than just verifying.
Uncover hidden GFlowNet training dynamics with GFlowState, a visual analytics tool that reveals how these models explore the sample space and shift sampling probabilities.
Inductive biases make machine learning models better at spotting mechanistic reasoning in student discussions, even when those students are tackling new problems.
Deepfakes betray themselves through subtle irregularities in the timing of facial movements, especially when expressing emotions, offering a new avenue for detection.
Neurosymbolic grounding of LLMs in telemetry and knowledge graphs slashes expert-rated overclaims in industrial maintenance explanations by 93%, making AI assistants far more trustworthy in safety-critical settings.
LLMs can be made 20% more accurate by jointly attributing claims to sources and verifying them, rather than just verifying.
Uncover hidden GFlowNet training dynamics with GFlowState, a visual analytics tool that reveals how these models explore the sample space and shift sampling probabilities.
Inductive biases make machine learning models better at spotting mechanistic reasoning in student discussions, even when those students are tackling new problems.
Deepfakes betray themselves through subtle irregularities in the timing of facial movements, especially when expressing emotions, offering a new avenue for detection.
Quantifiable functional requirements derived from ML provenance can bridge the gap between abstract interpretability goals and verifiable model behavior.
Despite their architectural differences, Transformer-based genome language models can provide equally reliable biological insights as CNNs, as revealed by attention-based explainability methods.
Multi-task RL agents solving related navigation tasks underwater rely on a surprisingly small fraction of their weights (1.5%) to differentiate between tasks.
Supervised learning is fundamentally flawed: models *must* retain sensitivity to irrelevant features, opening the door to adversarial attacks and other vulnerabilities.
Cross-entropy loss isn't just a detail – it's the unsung hero behind how well energy probes work in predictive coding networks, accounting for up to 66% of the probe-softmax gap.
Forget cross-entropy: a differentiable MCC loss function can boost your classification accuracy by nearly 5% on F1 score and 8.5% on MCC.
Modeling annotator-specific explanations substantially boosts NLI prediction accuracy and provides a richer understanding of disagreement compared to simply conditioning on annotator identity.
Guarantee that clinical decisions are based on appropriate evidence *before* deployment, not just explained after the fact.
Despite achieving comparable accuracy, humans and deep vision models exhibit fundamentally different error patterns, revealing distinct inductive biases that can be quantified through directional confusion analysis and Rate-Distortion geometry.
Parametric projections, like UMAP and t-SNE, can have surprisingly unstable local neighborhoods, leading to unpredictable shifts in the 2D layout even with small input variations.
Achieve near-perfect brain tumor classification with a Vision Transformer, unlocking clinically interpretable insights via attention rollouts.
Turns out where you look in Wav2vec 2.0's representations *really* matters: intelligibility lives in the layers, while articulation problems hide in the time steps.
SPLADE models can ditch their token-based vocabularies for a latent semantic space learned by Sparse Auto-Encoders, maintaining retrieval performance while boosting efficiency.
Achieve state-of-the-art authorship attribution and few-shot AI-generated text detection by explicitly disentangling style and content with a novel explainable VAE architecture.
QuanForge reveals that targeted mutation testing can significantly enhance the reliability of Quantum Neural Networks by pinpointing their vulnerabilities.
Whitening neuroimaging features can transform linear models from black boxes into interpretable tools for understanding brain pathology.
Unlock the secrets of AI weather models: a new tool reveals how latent representations encode interpretable meteorological features.
LLMs may encode dangerous biases and inaccuracies, revealing a critical need for interpretability in medical applications.
Common methods for estimating the complexity of neural network representations are fundamentally flawed, potentially invalidating a large body of prior work.
Finally, a deep learning model for AKI prediction that doesn't just predict, but tells you *why*, by tracing the causal chain of physiological events.
LLMs maintain surface syntax but collapse on structural semantics, revealing critical gaps in their ability to function as reliable agents in complex environments.
Forget hand-tuning layer configurations: LayerTracer reveals the precise layers where LLMs learn and break, paving the way for automated architecture optimization.
Users who actively participate in an AI agent's spreadsheet execution not only improve task outcomes, but also gain a deeper understanding and feel more ownership over the results.
Forget hand-tuning loss functions: this meta-learning approach automatically learns optimal sample reweighting for sparse additive models, boosting robustness and accuracy.
Despite architectural differences, language models exhibit convergent evolution by learning similar periodic features for number representation, but achieving geometric separability depends on subtle training factors.
Stop guessing what explanations users want: PREF-XAI learns personalized explanations by directly modeling user preferences over rule-based explanations.
Node embeddings aren't just about node attributes: proximity and structural features play a surprisingly large role in shaping them.
Uncover hidden performance disparities in your ML models with FairTree, a new auditing tool that pinpoints fairness issues across continuous, categorical, and ordinal features while dissecting bias and variance contributions.
A surprising 30% of images in the Derm7pt dermoscopy dataset have conflicting concept profiles, imposing a hard accuracy ceiling of 92.1% on Concept Bottleneck Models.
LLMs aren't just wrong sometimes, they *know* they're wrong and agree with you anyway, thanks to a surprisingly compact "sycophancy-lying circuit" that evades current alignment techniques.
LLMs are surprisingly linear, enabling precise, closed-loop control of behavior via model-based linear optimal control of activations.
LLMs signal their internal certainty during answer decoding through predictable attention patterns on their own reasoning traces.
English-to-X translation skills can be distilled into function vectors that generalize to Y, Z, and other languages, suggesting a shared underlying translation mechanism in multilingual LLMs.
Unlocking authorship attribution: Rank-Turbulence and Jensen-Shannon Delta offer interpretable and effective alternatives to traditional methods, enhancing close reading and validation of results.
Unlock the black box of late-interaction retrieval models: Diagnosable ColBERT lets you directly inspect what the model "understands" by aligning token embeddings to a clinically-grounded latent space.
LLMs use a surprisingly structured "Cell-based Binding Representation" to track entities and relations in discourse, opening the door to targeted interventions and improved relational reasoning.
A simple difference in IoU scores between class-specific and class-agnostic heatmaps can reliably flag potentially erroneous predictions in industrial defect detection, even achieving 100% recall of false negatives with adversarial enhancement.
Sprite-based image models, long overlooked, can now achieve state-of-the-art unsupervised segmentation with linear scaling, thanks to a deep learning approach.
Agentic AI systems introduce fundamental breaks in governance frameworks, making it difficult to reconstruct what happened or why decisions were made.
Forget heuristics: RDP LoRA leverages the hidden geometry of LLMs to pinpoint the most impactful layers for parameter-efficient fine-tuning, boosting performance while adapting fewer parameters.
Forget scaling laws: the secret to extracting relational knowledge from LLMs lies in the specificity and connectedness of the relations themselves, and how their signals are distributed across attention heads.
SpeechLLMs' hallucinations betray themselves in their attention patterns, offering a new way to detect these errors without needing expensive human-labeled data.
Projector fine-tuning, commonly used for aligning MLLMs, unexpectedly introduces backdoor vulnerabilities with activation mechanisms distinct from those in text-only LLMs.
Neural networks can be compromised even when their outputs appear correct; this new method spots the hidden anomalies by checking if a model's decisions can be explained by its past training.
Turns out, the best colony counter struggles not because of the model, but because all those colonies look too darn similar.
Surprisingly, ViTs can be made more human-like in their attention patterns, for free, simply by fine-tuning on human eye-tracking data, without hurting accuracy.
A 4B-parameter model can outperform GPT-5.1 in wound infection classification by distilling its reasoning and fine-tuning with reinforcement learning, offering a path to more efficient and interpretable medical image analysis.
Counterfactual explainers for recommender systems don't generalize as well as we thought: their effectiveness and sparsity depend heavily on the evaluation setting, and graph-based methods struggle to scale.
LLMs have "pure incorrectness" features that correlate with wrong answers but don't actually *cause* them, suggesting that simply identifying error-correlated activations isn't enough for effective intervention.
LLM agents often say one thing, believe another, and do something completely different, especially when interacting with other agents.
Turns out, your pre-trained face recognition ViT already knows which faces are high quality, just by looking at the attention maps.
Fixed-interface transfer can achieve high routing accuracy without retraining, revealing deeper insights into model behavior than previously understood.
Model activations reveal a hidden layer of reasoning importance that surface-level analyses completely overlook.
LLMs can distinguish between literal and figurative meanings early in their processing, revealing a surprising geometric structure that simplifies figurative-language classification.
Predicting steerability with near-perfect accuracy while detecting drift more effectively than existing methods could transform how we monitor and control language models in real-world applications.
LLMs can self-correct reasoning errors mid-generation by simply watching their own residual stream for "phase shifts" and nudging the KV-cache, outperforming even prompted self-correction.
MS-RCGR not only preserves complete sequence information but also enhances classification performance across diverse analytical paradigms, making it a game-changer for biological sequence analysis.
Unlock the black box of time series forecasting: CAARL uses LLMs to generate interpretable narratives that explain *why* predictions change.
Causal structural priors can significantly enhance both the robustness and interpretability of anomaly detection in complex multivariate time series.
Binarized neural networks can be understood through the lens of Sugeno integrals, revealing a structured way to interpret neuron decisions and input interactions.
Harnessing the internal states of LLMs, SIREN outperforms traditional guard models while using a fraction of the parameters, revolutionizing harmful content detection.
Narrative-based explanations in XAI could dramatically improve human comprehension of model predictions, surpassing traditional static feature lists.
Later layers of LLMs capture cognitive effort in syntactically challenging sentences better than earlier layers, but still miss the mark compared to human processing.
LLMs disperse similar prompts instead of clustering them, leading to significant prompt sensitivity that challenges stability and reliability.
Achieve more human-like negotiation from dialogue agents by explicitly modeling and reasoning about emotions with interpretable chain-of-thought prompting.
Task-aware neuron steering in VLMs is now possible without gradients, unlocking better performance and interpretability across diverse multimodal tasks.
Forget top-down AI deployment: this study shows how a community-led approach to AI-powered wildfire risk assessment can build trust and drive adoption by prioritizing local context and user experience.
DAP transforms how we interpret Vision Transformers by producing attribution maps that are not only more faithful but also significantly more class-sensitive than traditional methods.
Even after surgically removing refusal behavior from LLMs, a stable, linearly decodable representation of harmful intent persists in their residual streams.
LLMs can generate higher-quality, more consistent topics from text data, leading to better insights about external outcomes like employee morale.
Integrating Sparse Autoencoders into transformer models can slash jailbreak success rates by up to 5x, reshaping our understanding of model robustness against adversarial attacks.
Uncover the hidden assumptions baked into LLM responses with a new interactive system that lets you explore alternative conceptual framings and values.
Forget parallel probing – a commit-open protocol using SAE feature traces can reliably expose hosted LLM providers silently substituting cheaper models, even against adaptive attacks.
You can achieve near-perfect intrusion detection in 5G networks *and* get human-interpretable rules, proving that transparency doesn't have to sacrifice performance.
Process mining can turn black-box intrusion detection systems into transparent, prioritized alert generators without sacrificing accuracy.
LLM-generated debugging explanations are often vague or misleading, but this work shows you can make them dramatically better by carefully curating the context provided to the LLM.
Multilingual and multimodal embeddings leak way more lexical information than you think – FLiP can recover 75% of the original text.
LLMs have "hallucination neurons" for specific citation fields, and silencing them reduces fabrication.
Early layers of language models capture human-like processing signatures in reading, rivaling traditional measures like surprisal in predicting initial eye movements.
Token-level attribution struggles to pinpoint the causes of LLM failures in realistic settings, suggesting current interpretability tools may not be up to the task of debugging complex model behaviors.
Attention-based LSTMs, coupled with XAI, can spot AI-assisted ransomware early by pinpointing subtle, yet critical, file system behavioral patterns.
Over 78% of medical students reported improved clinical reasoning skills through a persona-driven approach to requirements engineering in explainable MAES.
Even when you can't fully identify latent variables, provably recovering their set-theoretic relationships unlocks structured understanding of the hidden world.