Search papers, labs, and topics across Lattice.
83 papers published across 5 labs.
Achieve human-readable interpretability in medical tabular data classification without sacrificing accuracy by learning and comparing against prototypical patient feature subsets.
Robots can boost their perceived competence by 83% simply by tweaking navigation behaviors suggested by a causal Bayesian network.
Achieving fairness doesn't just mean equal outcomes—this work shows how to enforce consistent reasoning across groups by penalizing disparities in counterfactual explanations.
Uncover hidden backdoors in your neural networks by tracing the active paths that malicious triggers exploit.
Forget subjective human evaluations: this paper uses a clever knowledge distillation trick to objectively rank XAI methods for NMT, revealing that attention-based attributions beat gradient-based ones.
Robots can boost their perceived competence by 83% simply by tweaking navigation behaviors suggested by a causal Bayesian network.
Achieving fairness doesn't just mean equal outcomes—this work shows how to enforce consistent reasoning across groups by penalizing disparities in counterfactual explanations.
Uncover hidden backdoors in your neural networks by tracing the active paths that malicious triggers exploit.
Forget subjective human evaluations: this paper uses a clever knowledge distillation trick to objectively rank XAI methods for NMT, revealing that attention-based attributions beat gradient-based ones.
Uncover the hidden causal chains inside your LLM with Causal Concept Graphs, which outperform existing methods for reasoning by explicitly modeling concept dependencies.
Speech deepfake detection gets a reasoning upgrade: HIR-SDD uses chain-of-thought prompting with Large Audio Language Models to not only detect fakes but also explain *why* it thinks they're fake.
Clinicians using HeartAgent, a cardiology-specific agent system, improved diagnostic accuracy by 26.9% and explanatory quality by 22.7% compared to unaided experts.
Forget fine-tuning: surprisingly, single neuron activations in VLMs can be directly probed to create classifiers that outperform the full model, with 5x speedups.
Chinese metaphor identification is highly sensitive to the choice of protocol, dwarfing the impact of model-level variations, yet can be tackled with fully transparent, LLM-assisted rule scripts.
Prompt highlighting in LLMs gets a serious upgrade: PRISM-$\Delta$ steers models to focus on relevant text spans with better accuracy and fluency, even in long contexts.
Fair-Gate disentangles speaker identity and sex in voice biometrics, boosting fairness without sacrificing accuracy by explicitly routing features through identity and sex-specific pathways.
LLMs possess a "word recovery" mechanism that allows them to reconstruct canonical word-level tokens from character-level inputs, explaining their surprising robustness to non-canonical tokenization.
LLM activation spaces aren't linear, and exploiting their true geometry with "Curveball steering" unlocks more effective control than standard linear interventions.
Forget interference as just noise: correlated features in neural networks can constructively superpose to form semantic clusters, especially with weight decay.
Backdoor defenses focused on removing training triggers are fundamentally flawed, as alternative, perceptually distinct triggers can reliably activate the same backdoor via a latent feature-space direction.
BrainSTR disentangles subtle disease signatures in dynamic brain networks by explicitly modeling spatio-temporal dependencies with contrastive learning, revealing interpretable biomarkers for neuropsychiatric disorders.
Forget return curves – a simple measure of neuron activation patterns (OUI) at just 10% of training can predict PPO performance better than existing methods, enabling early pruning of bad runs.
Forget black-box policies: CSRO uses LLMs to generate human-readable code policies in multi-agent RL, achieving performance competitive with traditional methods.
LLMs' attention patterns subtly shift with emotional tone, and explicitly accounting for these shifts during training improves reading comprehension even on neutral datasets.
Language models often disregard provided context, choosing instead to rely on potentially outdated or conflicting information learned during pre-training, revealing a critical flaw in their knowledge integration.
DNN neurons often fire *more* strongly when a concept is missing, revealing a blind spot in standard XAI methods that can now be addressed.
Mixture-of-Experts models might be hiding more of their reasoning than we thought, thanks to a newly quantified "opaque serial depth" metric.
LLM explanations are far more sensitive to the task being performed than the context or learned classes, highlighting a critical instability in current interpretability methods.
Attention heatmaps in MIL models for histopathology are often misleading, and simpler methods like perturbation or LRP provide more faithful explanations.
You can now audit black-box vision models for biases and failure modes using only their output probabilities, thanks to a clever LLM-powered semantic search.
Detect anomalies in complex systems with a novel explainable condition monitoring methodology that learns from healthy data alone, offering competitive performance and enhanced interpretability for safety-critical applications.
Forget noisy, biased LLM evaluators: CDRRM distills preference insights into compact rubrics, letting a frozen judge model leapfrog fully fine-tuned baselines with just 3k training samples.
Get zero-shot, explainable fault diagnoses from your industrial time series data by translating sensor signals into natural language that LLMs can understand.
LLaMA and Gemma may seem to understand complex conditional statements, but they're really just pattern-matching, not grasping the underlying pragmatic nuances of presuppositions.
LLMs can now safely navigate the complexities of acupuncture clinical decision support, thanks to a neuro-symbolic framework that slashes safety violations from 8.5% to zero.
Stop blindly trusting your fault detection models: this hybrid CNN-GRU approach uses explainable AI to reveal the reasoning behind its predictions, enabling adaptation and root cause analysis in automotive software validation.
Time series counterfactual explanations can now be more realistic thanks to a novel soft-DTW-based approach that ensures temporal structure.
By representing prototypes as orthonormal bases on the Stiefel manifold, this work makes prototype collapse infeasible by construction, leading to more interpretable and accurate image recognition.
Causal effects between high-dimensional variables may be simpler than you think: they often depend only on low-dimensional summary statistics, or bottlenecks, of the causes.
Uncover hidden vulnerabilities in Transformer models with SYNAPSE, a training-free framework that reveals how small manipulations can redirect predictions despite the redundancy of task-relevant information encoded in broad neuron subsets.
By explicitly modeling joint mechanics with language-aligned tokens, BioGait-VLM prevents gait analysis models from overfitting to visual shortcuts and unlocks improved generalization and interpretability.
LLMs represent meaning more abstractly than previously thought: changing the script of a sentence (Latin vs. Cyrillic) causes less representational divergence than paraphrasing it within the same script.
Achieve more accurate and interpretable mortality risk predictions in ICUs by explicitly modeling irregular temporal dynamics and integrating standardized medical knowledge into time-aware RNNs.
Code obfuscation doesn't always make things harder for humans: certain renaming techniques in Python can actually *improve* program comprehension compared to the original code.
Diffusion language models have surprisingly redundant early layers, enabling nearly 20% FLOPs reduction at inference time via layer skipping without sacrificing performance.
Whitening the embedding space of GPT-2-small exposes cluster commitment as the key geometric property separating different types of language model hallucinations.
Achieve transformer interpretability by disentangling token and context processing streams, with only a 2.5% performance hit using Kronecker mixing.
Forget retraining: Steering a handful of attention heads in audio-language models can boost audio understanding by 8%, revealing a surprisingly simple way to overcome text dominance.
LLM feed-forward networks have hidden spectral signatures that predict generalization and respond predictably to design choices, opening the door to more principled architecture and optimizer selection.
BioLLMAgent bridges the gap between interpretable but unrealistic RL models and realistic but opaque LLM agents, offering a "computational sandbox" for testing psychiatric hypotheses.
A "credibility warning system" for AI-driven business decisions is now possible, thanks to a new metric that reveals how much explanations wobble when the data shifts.
AI models can detect injected thoughts, but they often have no idea *what* those thoughts are, relying on content-agnostic anomaly detection and then guessing common concepts.
LLMs often know the answer long before their "reasoning" suggests, wasting tokens on performative chain-of-thought.
Algorithmic decisions about humans can now be audited for "Representation Fidelity" by checking if they align with self-reported descriptions, revealing potential biases and inaccuracies.
Transformers perform analogical reasoning by aligning feature representations of similar entities, but only if trained with the right curriculum.
The common belief that a two-step decision workflow reduces overreliance on AI advice doesn't hold up, and the effectiveness of explanations hinges on the specific workflow and user expertise.
Forget retraining: you can steer a robot's behavior in real-time by nudging its internal representations with lightweight, targeted interventions.
Achieve human-readable interpretability in medical tabular data classification without sacrificing accuracy by learning and comparing against prototypical patient feature subsets.
AI models are more like patients than black boxes: "Model Medicine" offers a clinical framework and open-source tools to diagnose and treat their "ailments."
Forget retraining or complex architectures: a simple linear head can effectively eliminate missingness bias in feature attribution, rivaling heavyweight methods.
By constraining Transformer architectures to have bounded representations and uniform attention, grokking can be bypassed entirely for modular addition, suggesting task-specific geometric alignment is key.
Achieve more robust and informative visual explanations for CNNs by adaptively fusing gradient-based and region-based CAM methods, outperforming existing approaches on standard benchmarks.
Forget probing transformer block outputs – the *real* OOD performance gains in ViTs come from selectively probing feedforward network activations or self-attention outputs depending on the severity of the distribution shift.
Hallucinations in VLMs can be predicted *before* any text is generated, opening the door to early intervention and more efficient, safer models.
Escape the curse of off-manifold Shapley values: this new method leverages optimal generative flows to produce attributions that actually respect the data manifold.
Prototype-based deep learning offers a more trustworthy approach to prostate cancer grading by mirroring a pathologist's workflow of comparing suspicious regions with clinically validated examples.
Achieve expert-level hepatology diagnosis by mimicking multidisciplinary consultation, using an AI system that combines knowledge graph reasoning, clinical guidelines, and a multi-agent system for traceable consensus.
Finally, a PAR framework that doesn't just classify patient activities, but tells you *why* a set of visual cues implies a risk, complete with auditable rule traces and counterfactual interventions.
Asymmetric Shapley values offer a more robust and interpretable approach to feature importance in clinical prediction by accounting for collinearity and known directional dependencies, overcoming limitations of traditional methods.
Finally, a unified framework illuminates the "what if" transitions between time-series clusters, using counterfactual explanations to reveal the minimal perturbations that shift a time-series from one cluster to another.
Pre-normalization in Transformers is the culprit behind the mysterious link between massive activation outliers and attention sinks, but decoupling them reveals their distinct functions: global parameterization vs. local attention modulation.
LLMs struggle more with restructuring solution spaces than refining constraints, revealing a key asymmetry in their reasoning abilities that standard benchmarks miss.
Surprisingly, a compact, training-free set of acoustic parameters rivals DNN embeddings and approaches self-supervised models in voice timbre attribute detection, offering interpretability and efficiency.
Robots can nimbly switch between autonomous and teleoperated modes based on the confidence of their learned perception, leading to more reliable manipulation.
Forget just deleting edges: XPlore uses gradients to intelligently tweak node features *and* add edges, unlocking more valid and faithful counterfactual explanations for GNNs.
Face pareidolia reveals that a vision model's behavior under ambiguity is governed more by representational choices than score thresholds, and that low uncertainty can signal either safe suppression or extreme over-interpretation.
TaxonRL doesn't just beat humans at bird identification; it shows its work, revealing a transparent reasoning process that could revolutionize how we trust AI in complex visual tasks.
LLMs can be harnessed to refine neural topic models, yielding substantial gains in topic quality and interpretability without sacrificing document representation accuracy.
L2 weight regularization unlocks stable and steerable sparse autoencoders, doubling steering success rates and aligning feature explanations with functional controllability.
Max-Plus networks, despite their interpretability, can be efficiently trained by exploiting the algebraic sparsity of their subgradients, leading to faster updates.
Static word embeddings like GloVe and Word2Vec can achieve surprisingly high accuracy (R^2 up to 0.87) in recovering geographic and temporal information, challenging the interpretation of similar findings in LLMs as evidence of complex world models.
A new additive classification model reveals that plaque texture, as assessed by ultrasound radiomics, is strongly associated with stroke risk, offering a non-invasive marker for improved patient stratification.
Demographic biases in brain MRI stem primarily from anatomical variations, not just acquisition-dependent contrast, challenging assumptions about bias mitigation strategies.
Agentic AI can actually *hurt* explanation quality for sophisticated "thinking" models analyzing physiological data, challenging the assumption that more complex reasoning always leads to better clinical insights.
Forget inspecting final outputs: LLMs telegraph their reward-hacking intentions internally, early in the generation process, via distinctive activation patterns.
Softmax attention heads specialize in stages during training, and a novel Bayes-softmax attention can achieve optimal prediction performance by reducing noise from irrelevant heads.
Finally, a forecasting model that's as accurate as the black boxes but actually tells you *why* it made that prediction.
Multimodal models are often blind at birth: a new "Visual Attention Score" reveals they struggle to focus on visual inputs during cold-start, but a simple attention-guided fix can boost performance by 7%.