Search papers, labs, and topics across Lattice.
33 papers published across 2 labs.
Unlocking interpretable clinical forecasting: StructGP recovers causal relationships and patient progression patterns directly from irregular EHR data, outperforming black-box methods in accuracy and uncertainty calibration.
TEA Nets reveal that LLMs express sadness with lower emotional intensity than humans in psychotherapy contexts, highlighting potential limitations in their ability to simulate genuine emotional responses.
Forget complex training schemes – pinpointing and tweaking just 20 neurons can flip an LLM from sycophantic to truthful, thanks to a new "perturbation probing" technique.
Unsupervised knowledge injection via fuzzy logic lets image classifiers reason about concepts they were never explicitly trained on, boosting accuracy and generalization.
LLMs can have their personalities surgically altered by tweaking just 0.5% of their neurons, preserving general capabilities while achieving competitive control.
Unlocking interpretable clinical forecasting: StructGP recovers causal relationships and patient progression patterns directly from irregular EHR data, outperforming black-box methods in accuracy and uncertainty calibration.
TEA Nets reveal that LLMs express sadness with lower emotional intensity than humans in psychotherapy contexts, highlighting potential limitations in their ability to simulate genuine emotional responses.
Forget complex training schemes – pinpointing and tweaking just 20 neurons can flip an LLM from sycophantic to truthful, thanks to a new "perturbation probing" technique.
Unsupervised knowledge injection via fuzzy logic lets image classifiers reason about concepts they were never explicitly trained on, boosting accuracy and generalization.
LLMs can have their personalities surgically altered by tweaking just 0.5% of their neurons, preserving general capabilities while achieving competitive control.
Forget scaling laws: surgically debiasing reward models by intervening on just 2% of neurons lets smaller models punch *way* above their weight in alignment.
CNN classifiers don't just select from cleaned features, they actively cancel out shared background information via destructive interference, rewriting our understanding of how these networks actually "see".
TSFMs can achieve competitive forecasting performance in critical infrastructure applications while also providing interpretable explanations that align with established domain knowledge.
Sparse autoencoders, despite their popularity for extracting interpretable features, often fail to capture the underlying manifold structure of concepts, instead fragmenting them across multiple, diluted features.
Pinpointing the root cause of transformer failures just got a whole lot easier: DEFault++ can detect, categorize, and diagnose faults with high accuracy, even down to specific mechanisms.
Uncover hidden drivers of disparity: pinpoint the specific combinations of characteristics that explain outcome gaps between populations.
LLMs betray prompt injection attacks with a tell-tale "restlessness" in their activation trajectories, detectable even when individual turns appear harmless.
Claims of human-like cognition in models like CENTAUR crumble under LAPITHS, a framework that reveals these models' performance can be replicated by systems lacking cognitive plausibility.
LLMs stubbornly stick to task-appropriate reasoning even when explicitly instructed to use conflicting logic, but targeted interventions can nudge them towards better instruction following.
Texture, not color, is the secret sauce behind fashion house identity, revealed by probing a multimodal CNN trained on decades of Vogue runway images.
Uncover the hidden drivers behind your KPIs: a new attribution framework finally explains *why* your metrics move, not just *what* changed.
LLMs aren't just memorizing words; they're organizing them in a feature space that mirrors the nuanced semantic relationships humans perceive.
LLMs' factual recall falters when fine-tuned on new information, and this can be traced to specific latent directions in the residual stream.
Quantum computing can surface critical network attack patterns that classical methods miss, achieving up to 99.6% test precision on unique subgroups.
Quantum annealing offers a surprisingly effective route to interpretable AI, outperforming standard gradient-based methods in disentangling CNN decision boundaries.
GNNs tagging jets at the LHC aren't black boxes: explainability methods reveal they learn physically meaningful features of QCD, with performance varying predictably across energy regimes.
Rule extraction from tree ensembles just got 22x faster, without sacrificing accuracy or interpretability.
Forget what you thought you knew about how models learn: analyzing loss gradients, not just parameter updates, reveals a hidden order of magnitude increase in the coupling between learned features and parameter space.
LLMs process emotions in three distinct phases, but some emotions like Disgust are represented far more weakly and diffusely than others.
Feature decorrelation during training not only sharpens saliency maps, but also *improves* model accuracy, challenging the conventional wisdom that interpretability comes at the cost of performance.
Concept extraction's identifiability problem just got a lot easier, thanks to a new framework that turns guarantee proofs into set intersection problems.
RL's superior generalization isn't about brute force, but about carefully sculpting a few key features while preserving the base model's knowledge, unlike SFT's rapid specialization.
A single, tuning-free "health signal" derived from layer activations can catch backdoors, jailbreaks, and prompt injections in LLMs, even without a clean reference model.
Forget fixed steering strengths - CLAS dynamically adapts steering based on context, unlocking more consistent and powerful control over LLM behavior.
GraphRAG's black-box reasoning gets a spotlight: XGRAG reveals how specific knowledge graph components influence LLM outputs, boosting explanation quality by 14.81% over standard RAG explainability methods.
Biophysically-constrained models of gene regulation, learned via probability flow matching, are the only ones that accurately predict cell fate decisions and responses to perturbations, even when other models interpolate the training data just as well.
Concept bottleneck models can now distinguish between reducible model uncertainty and irreducible input ambiguity, enabling targeted interventions like data collection and human review.
Neurosymbolic grounding of LLMs in telemetry and knowledge graphs slashes expert-rated overclaims in industrial maintenance explanations by 93%, making AI assistants far more trustworthy in safety-critical settings.