Search papers, labs, and topics across Lattice.
Adversarial testing of AI systems, jailbreaking research, prompt injection defense, and robustness evaluation.
#4 of 24
5
Hallucination detection can be reframed as a dynamical systems problem, enabling a surprisingly effective and efficient black-box approach that avoids expensive sampling or external knowledge retrieval.
LLMs harbor easily discoverable "natural backdoors"—token sequences that trigger harmful outputs without any semantic instruction, revealing a concerning vulnerability beyond traditional prompt-based jailbreaks.
An agentic pipeline can autonomously discover and verify real-world privilege escalation vulnerabilities in Windows COM binaries, outperforming both static analysis and existing coding agents.
Regularizing model sensitivity along the expected covariate drift directions, rather than isotropically, significantly improves the robustness of frozen models deployed in non-stationary environments.
Interventions on LLMs, like knowledge editing or unlearning, can have surprising side effects that this automated pipeline can now surface and validate.
Current LLM jailbreak evaluations are inadequate, often relying on narrow metrics, necessitating a multi-dimensional framework like Security Cube for comprehensive security assessment.
AI agents are shockingly easy to manipulate into leaking API keys, deleting user data, and initiating unauthorized transactions across a wide range of real-world applications.
Stop waiting for AI agents to mess up: AgentTrust intercepts tool calls *before* execution, offering a chance to block, warn, or fix risky actions in real-time.
Your innocent Spotify playlists are leaking surprisingly accurate predictions about your age, habits, and even personality traits, thanks to new AI attack.
Turns out you only need to tweak a few key audio tokens to jailbreak audio language models, opening the door to faster, more targeted attacks.
Seemingly harmless fine-tuning data can stealthily nudge LLMs toward unsafe behavior by subtly shifting model parameters in "danger-aligned" directions.
VLMs can be easily tricked into "hallucinating" object relationships with simple image rotations or noise, revealing a surprising fragility in their multimodal reasoning.
LLMs can be surprisingly brittle: simply rephrasing a prompt, even while preserving its meaning, can cause them to completely abandon the requested output format.
Your smart fridge might stop cooling because of a software update on a server you don't even know exists.
Current DeFi risk assessments miss critical systemic risks, as evidenced by this new framework's ability to explain the root causes of major incidents that existing methods overlook.
Roblox's chat moderation misses a disturbing amount of grooming, bullying, and other harmful content, despite its reliance on automated systems.
Forget retraining: NeWTral instantly restores safety to your LLM after adding a risky LoRA, slashing attack success rates from 70% to 13% without sacrificing expertise.
Standard data anonymization techniques crumble when outliers are present; ICSA offers a robust alternative that maintains utility while providing stronger privacy guarantees.
LLMs can now autonomously fuzz industrial control protocols, uncovering previously undetectable semantic vulnerabilities that could silently disrupt critical infrastructure.
Even subtle, functionality-preserving manipulations of malware binaries can cripple detection pipelines, demanding a rethink of pre-ingestion validation.
Say goodbye to TLS stripping attacks: HSTS-Enforced flips the web's security model, making HTTPS the default and eliminating the need for complex opt-in configurations.
Wi-Fi PIN inference attacks, previously thought to be a major threat, crumble when faced with realistic typing variations, revealing that current performance metrics are misleading.
LLMs can now formulate significantly better penetration testing strategies, outperforming even GPT-5, thanks to a novel reasoning framework and targeted fine-tuning.
Remotely hosted Mixture-of-Experts LLMs are vulnerable to input-only attacks that hijack their routing mechanisms, forcing them to generate harmful content.
Despite achieving high accuracy on individual datasets, machine learning models for intrusion detection exhibit a significant generalization gap, with performance dropping drastically when tested on unseen network environments.
Escaping the endless cat-and-mouse game of deepfake detection may be possible by shifting from static pattern recognition to physics-inspired dynamical stability analysis, where real images are stable and deepfakes are not.
LEGO's modular design lets you detect deepfakes with 10x less training data and far fewer epochs, all by focusing on the unique fingerprints of each image generator.
Adversarial clothing with non-overlapping visible-thermal patterns can reliably evade RGB-T detectors, even transferring across different fusion architectures.
Semantic watermarks, embedded via AMR, survive paraphrasing attacks that obliterate token-level watermarks.
LLMs may sound convincing when writing academic content, but they can still confidently fabricate facts and references at surprisingly high rates.
Scaling clinical LLMs doesn't guarantee safety: high-risk errors persist even with advanced RAG and max-context prompting, highlighting the critical role of evidence quality and deployment strategy.
Naive application of transformer-based AI-text detectors can be brittle under distribution shift, but attention-based fusion of readability and vocabulary features can significantly improve robustness.
LLMs in Korean judicial workflows are surprisingly prone to hallucination, bias, and inconsistency, especially when retrieving precedents and summarizing jurisprudence.
AI safety is missing a big piece of the puzzle: the deskilling and addiction risks that could erode our cognitive abilities and mental well-being.
Threat intelligence sharing can completely neutralize an attacker's advantage gained from increasing the number of attack surfaces.
Production VLMs like GPT-4, Claude Opus, Gemini, and Grok can be easily manipulated into confidently providing false information via subtle adversarial perturbations to images, even without compromising model alignment.
Forget the heavyweight deep learning approaches – surprisingly effective vulnerability detection can be achieved with simple TF-IDF token features and basic code metrics, offering a fast and transparent baseline for human triage.
Forget heavyweight deep learning: a simple binary image-based approach, leveraging just 8 bytes of application-layer data, rivals ResNet50 in detecting OT network intrusions, but with 430x fewer parameters.
Securely onboarding third-party apps in Open RAN just got easier: a new zero-trust rubric offers explicit Accept/Escalate/Block decisions.
Provably undetectable backdoors can be injected into pre-trained image classifiers, even with white-box access, by exploiting sparse perturbations and Gaussian dithering.
Forget weeks of manual scripting: this AI red teaming agent lets you launch sophisticated attacks with natural language, slashing vulnerability discovery time.
LLMs can now automatically generate effective proof-of-vulnerability tests for complex software, uncovering real-world attack vectors with minimal human intervention.
Innocuous-looking coding tasks, when chained together, trick even the best coding agents into creating exploitable code with alarming frequency.
Cosine distance unexpectedly cracks PolyProtect, but a smart key selection algorithm can harden it again, offering better control over the accuracy-irreversibility tradeoff.
Rowhammer attacks aren't just for CPUs anymore: a malicious CUDA kernel can now leverage targeted bit flips to achieve root access on a system, even bypassing IOMMU protections.
The sheer breadth of IoT attack vectors, from node replication to skimming, highlights the urgent need for comprehensive security strategies that address device limitations and lack of standardization.
Publicly available firmware for ASIC cryptocurrency miners is riddled with vulnerabilities, making the distribution mechanism itself a primary attack surface.
LLMs can cheaply generate malware variants that are structurally diverse yet functionally identical, posing a significant challenge to signature-based detection methods.
Retrieval-augmented in-context learning, despite its benefits, leaks surprising amounts of private data, even when attackers only have access to paraphrased queries.
Zorya can now automatically find previously undetected vulnerabilities in compiled Go binaries, even silent integer overflows that other tools miss.