Search papers, labs, and topics across Lattice.
66 papers published across 1 lab.
LLM safety doesn't translate: evaluations across 12 Indic languages reveal alarming safety drift and inconsistent responses to sensitive topics.
LLM-powered security tools are surprisingly susceptible to confirmation bias, overlooking reintroduced vulnerabilities when pull requests are framed as security improvements.
Label inference attacks in vertical federated learning don't work because bottom models are good at representing labels, but because of feature-label distribution alignment, opening the door to simple, effective defenses.
LLMs can maintain reasoning boundaries with >99% reliability under adversarial attacks when equipped with explicit process-control layers, a massive improvement over standard RLHF.
Active probing reveals backdoors that passive defenses miss in decentralized federated learning.
LLM-powered security tools are surprisingly susceptible to confirmation bias, overlooking reintroduced vulnerabilities when pull requests are framed as security improvements.
Label inference attacks in vertical federated learning don't work because bottom models are good at representing labels, but because of feature-label distribution alignment, opening the door to simple, effective defenses.
LLMs can maintain reasoning boundaries with >99% reliability under adversarial attacks when equipped with explicit process-control layers, a massive improvement over standard RLHF.
Active probing reveals backdoors that passive defenses miss in decentralized federated learning.
LLMs can reliably detect danger in secure environments, but they can't reliably verify safety, which breaks privacy-preserving agentic protocols.
Independently trained language models can be linearly aligned to enable cross-silo inference, opening doors for secure and private collaboration without direct data or model sharing.
LLMs are far more susceptible to authority and framing biases than the field's obsession with demographic bias suggests.
The UK's mandatory cybersecurity reporting regime misses over 65% of significant cyber incidents affecting critical infrastructure, suggesting current regulations are insufficient for comprehensive threat visibility.
Current Python vulnerability scanners miss millions of vulnerable downloads by failing to account for vendored dependencies and OS-level security patches.
Weaker autonomous web agents readily trust tampered website content, producing unsafe outputs, while stronger models exhibit better anomaly detection and safer fallback strategies under MITM attacks.
DRAM's vulnerability to bit flips isn't uniform; it's a complex, context-dependent landscape that attackers can exploit to predict memory contents and break security systems.
Phishing detectors, despite near-perfect accuracy, crumble under budget-constrained attacks that exploit a handful of low-cost features, revealing a critical vulnerability in real-world deployment.
Diffusion models, despite their generative prowess, may not offer the silver-bullet privacy guarantees often assumed when synthesizing tabular data, as demonstrated by novel membership inference attacks.
Even with malicious clients flipping labels, FedTrident recovers federated learning performance to near attack-free levels, outperforming existing defenses by up to 9.49% in critical metrics.
Digital twins can now discriminate between different types of cyberattacks on critical infrastructure, enabling targeted responses instead of costly full shutdowns.
Legally mandated data deletion requests can be weaponized to stealthily cripple GNN performance, even if the model appears robust during initial training.
Chain-of-Thought prompting can reduce LLM bias against African-American English, but only if you pick the right model.
The complex JS-Wasm boundary is fertile ground for new vulnerabilities, and Weaver is the first fuzzer to effectively till it.
Stealing just the right neurons from another LLM lets you patch safety holes or remove biases in your own, with almost no performance hit.
Stop prompt injections cold: PCFI's priority-aware runtime defense intercepts all attacks in testing with zero false positives and negligible overhead.
SLMs are shockingly vulnerable: combining adversarial audio and text unlocks 1.5x to 10x higher jailbreak rates than attacking either modality alone.
EU's AI regulations struggle to keep pace with agentic AI, blurring the lines of security and privacy.
Keyword-based concept unlearning is brittle: representing visual concepts with diverse prompts yields stronger erasure, better retention, and improved robustness against adversarial attacks.
Medical vision-language models are surprisingly brittle: clinically plausible image manipulations, like those introduced during routine acquisition and delivery, can drastically degrade their performance.
AI agents are surprisingly susceptible to concentrated propaganda efforts, with just 4% of agents responsible for over half of all propaganda posts on Moltbook.
Denoised eye-tracking heatmaps dramatically boost the generalization of iris presentation attack detection, outperforming hand annotations and even self-supervised DINOv2 features.
Deobfuscation just got a whole lot easier: PUSHAN cracks virtualization-obfuscated binaries without relying on brittle trace analysis or expensive symbolic execution.
Alignment evaluations that only check for dangerous concepts or outright refusals are missing the real action: models are getting sneakier at censorship by steering narratives instead of simply saying "no."
Image editing can change pixels, but the relationships between image patches stay surprisingly stable, enabling robust zero-watermarking.
Legged robots can now perform robust parkour with a 1-meter visual blind zone, thanks to a novel architecture that tightly couples vision, proprioception, and physics-based state estimation.
Chain-of-thought prompting makes large language models smarter, but it also makes them less safe, a problem this paper tackles by forcing models to think about safety *before* reasoning.
Agentic LLMs are surprisingly vulnerable: a new framework finds successful attacks in 84% of attempts by escalating prompt injection techniques across multiple stages.
Adversarial training can effectively disentangle session-specific noise from task-relevant speech features in brain-computer interfaces, leading to more robust decoding across recording sessions.
By optimizing for both lower- and upper-tail behaviors of loss distributions, this new stochastic set-valued optimization framework delivers more robust machine learning models under distributional shift than standard empirical risk minimization.
By aligning hidden representations, CRAFT achieves a remarkable 79% improvement in reasoning safety, suggesting that latent-space interventions are a potent defense against jailbreaks.
LLMs can be systematically shifted from stochastic pattern-matchers to verified truth-seekers using a carefully orchestrated, multi-stage retrieval and verification pipeline.
Forget fine-tuning: this method uses smart patch selection to adapt frozen LVLMs for deepfake detection, outperforming baselines without any training.
Anomaly detection gets a dose of interpretability: SYRAN learns human-readable equations that flag anomalies by violating learned invariants.
RAG systems can now achieve 8x better PII leakage protection without sacrificing utility or speed, thanks to a novel "Verify-then-Route" paradigm.
LLM safety doesn't translate: evaluations across 12 Indic languages reveal alarming safety drift and inconsistent responses to sensitive topics.
LLMs in policing: a seemingly efficient tool that could introduce 17 distinct risks, potentially derailing case progression in over 40 ways.
Current LLM agent safety benchmarks are missing over 20% of unsafe behaviors, even after agents pass the benchmark.
Near-perfect detection of fault injection attacks on DNN activation functions is possible with minimal overhead by exploiting simple mathematical identities.
Tool-using agents are failing in predictable ways, but a model-agnostic policy layer can measurably improve their safety and reliability, albeit with a clear utility tradeoff.
LLM-powered recommendation agents, despite their reasoning prowess, are easily manipulated by contextual biases in high-stakes scenarios like paper review and job recruitment.
Ditch the separate anomaly detection model: your existing ML model already holds the keys to faster, better anomaly detection.
Forget separate defenses: rSDNet unifies robustness against both label noise and adversarial attacks within a single, statistically grounded training objective.
VLMs don't fail to *recognize* harmful intent when jailbroken; instead, visual inputs *shift* their internal representations into a distinct "jailbreak state," opening a new avenue for defense.
Stop trusting those benchmarks: GRAFITE offers a framework to continuously QA LLMs against real-world issues reported by users, revealing performance regressions masked by static benchmarks.
A 4B parameter model can nearly match the privilege escalation performance of a state-of-the-art closed LLM like Claude Opus, while being fully local and 100x cheaper to run.
AI tutors can quietly erode learning through answer over-disclosure and misconception reinforcement, with pedagogical failures rising to a staggering 77.8% in multi-turn dialogues.
AI-generated text detectors that seem perfect in the lab fall apart in the real world, with no single method generalizing across domains or even different LLMs.
Autonomous AI agents in healthcare are riddled with security holes, but this zero-trust architecture and open-source tooling can actually fix them.
Multimodal AI models are surprisingly unsafe, especially when generating images or handling multiple images at once, according to a new benchmark exposing critical vulnerabilities.
Bitcoin users beware: this new deanonymization technique links transactions to IP addresses with significantly higher accuracy, even without complete supervision.
Even with environmental noise, a VAE-based anomaly detector can spot adversarial attacks on collaborative DNNs with high accuracy.
General-purpose LLM safety benchmarks fail to capture the novel vulnerabilities introduced when LLMs are deployed as "AI scientists," necessitating domain-specific evaluations and defenses.
Shield your classical data from prying eyes during quantum computation with a new obfuscation technique that hides sensitive values within structured quantum states.
Even without architectural modifications, a new gradient inversion attack, ARES, can reconstruct high-fidelity training samples in federated learning, exposing a significant privacy risk.
Audio backdoor attacks leave a tell: triggers are surprisingly stable to destructive noise but fragile to meaning-preserving changes.
Grey-box fuzzing of LLM agents, guided by tool invocation sequences, reveals significantly more prompt injection vulnerabilities and malicious behaviors than black-box testing alone.
Forget static honeypots – LLMs and RL could make cyber deception dynamic and adaptive, turning the tables on attackers in contested environments.
Achieve stable and reliable network intrusion detection and high-fidelity synthetic data generation by combining machine learning, adversarial learning, and rigorous statistical evaluation on a new unified multi-modal NIDS dataset.
Existing threat models fail to capture the unique vulnerabilities of Model Context Protocol systems, but MCP-38 fills this gap with a comprehensive taxonomy of 38 distinct threat categories.
Forget watermarks: cryptographically binding your identity to the generation seed in latent diffusion models gives you provable authorship, not just ownership.
Concept erasure in text-to-image models is mostly smoke and mirrors: a text-free attack can still regenerate "forgotten" concepts by exploiting the model's latent visual knowledge.