Search papers, labs, and topics across Lattice.
67 papers published across 1 lab.
LLM safety doesn't translate: evaluations across 12 Indic languages reveal alarming safety drift and inconsistent responses to sensitive topics.
The Onto-Relational-Sophic framework offers a comprehensive philosophical foundation for governing synthetic minds, moving beyond tool-centric regulatory paradigms.
DPWFL privacy doesn't have to diverge: this work proves it can converge to a constant even with non-convex objectives and gradient clipping.
Why does explicit belief updating often fail to change your stress response? Authority-Level Priors (ALPs) may be the answer.
Label inference attacks in vertical federated learning don't work because bottom models are good at representing labels, but because of feature-label distribution alignment, opening the door to simple, effective defenses.
The Onto-Relational-Sophic framework offers a comprehensive philosophical foundation for governing synthetic minds, moving beyond tool-centric regulatory paradigms.
DPWFL privacy doesn't have to diverge: this work proves it can converge to a constant even with non-convex objectives and gradient clipping.
Why does explicit belief updating often fail to change your stress response? Authority-Level Priors (ALPs) may be the answer.
Label inference attacks in vertical federated learning don't work because bottom models are good at representing labels, but because of feature-label distribution alignment, opening the door to simple, effective defenses.
By enforcing graph isomorphism across counterfactual inputs, UGID reveals that debiasing LLMs can be achieved by directly manipulating internal representations and attention mechanisms.
Predictive policing algorithms can exhibit extreme racial bias, with one city showing a 157x higher detection rate for one racial group in a single year.
Independently trained language models can be linearly aligned to enable cross-silo inference, opening doors for secure and private collaboration without direct data or model sharing.
Forget random data mixing: MOSAIC uses failure analysis to intelligently curate training data, leading to better safety, less over-refusal, and improved instruction following, all at once.
LLMs are far more susceptible to authority and framing biases than the field's obsession with demographic bias suggests.
The UK's mandatory cybersecurity reporting regime misses over 65% of significant cyber incidents affecting critical infrastructure, suggesting current regulations are insufficient for comprehensive threat visibility.
LLMs surprisingly prioritize norm adherence over personal incentives in business scenarios, challenging assumptions about goal-driven behavior.
Unlocking fairer vision-language models may be as simple as intervening in the sparse latent space of a sparse autoencoder, enabling targeted bias removal without harming performance.
Achieve fairness without sacrificing accuracy: this post-processing ensemble method boosts fairness across diverse tasks and models.
Despite the hype, AI decision aids have had surprisingly little impact on actual judicial decisions, revealing a critical gap between algorithmic potential and real-world application.
Forget scaling laws: the *structure* of your AI governance system matters more than the specific LLM when it comes to preventing corruption.
AI-mediated video calls erode trust and confidence, even though they don't actually make people worse at spotting lies.
Stop shoehorning ideology into a left/right box: this framework lets you model complex belief systems as interconnected networks of concepts, revealing hidden relationships in social discourse.
GenAI terms of service make you solely responsible for your AI's outputs, even though you have no control over how the model works.
AI washing isn't just a marketing problem; it actively harms corporate green innovation, especially for smaller players in competitive markets.
Navigating the maze of differentially private graph release methods just got easier: a new framework helps practitioners choose the right approach, avoid common pitfalls, and make sound evaluations.
You *can* have it all: high-performance anomaly detection, interpretability, and fairness, even in highly imbalanced industrial datasets.
LLMs in a group Turing Test still make tell-tale mistakes that betray their AI origins, even when their language skills are otherwise convincing.
Human-AI teams often fail not because AI is inaccurate, but because humans miscalibrate their reliance on it, highlighting the need for readiness metrics beyond accuracy.
Legally mandated data deletion requests can be weaponized to stealthily cripple GNN performance, even if the model appears robust during initial training.
Blindly maximizing human-AI performance can degrade human expertise over time, revealing a critical trade-off that demands a new approach to system design.
Chain-of-Thought prompting can reduce LLM bias against African-American English, but only if you pick the right model.
LLMs penalize informal language in essays so severely that it's like marking a B+ down to a C+, even when explicitly told to ignore writing style.
LLMs, when used to annotate social media for human values, systematically overestimate "Openness to Change" compared to human experts, revealing a potential bias in automated value detection.
AI's attempts to provide support in online health communities can backfire by inappropriately conforming to, or outright violating, established community norms.
Overstating AI capabilities in fintech erodes trust and hinders digital financial inclusion among farmers, particularly those lacking strong social networks.
Stealing just the right neurons from another LLM lets you patch safety holes or remove biases in your own, with almost no performance hit.
Stop prompt injections cold: PCFI's priority-aware runtime defense intercepts all attacks in testing with zero false positives and negligible overhead.
VLMs' safety judgments are easily manipulated by simple semantic cues, revealing a reliance on superficial associations rather than true visual understanding.
The crucial difference between "Human-in-the-Loop" and "Human-on-the-Loop" isn't *where* the human is, but *how* their involvement causally shapes the AI's decisions.
Men and women see AI's impact very differently, with implications for how we teach ethics to future AI developers.
EU's AI regulations struggle to keep pace with agentic AI, blurring the lines of security and privacy.
Guaranteeing secure and compliant agent behavior in B2B environments may finally be within reach thanks to a new cryptographic admission control protocol.
Keyword-based concept unlearning is brittle: representing visual concepts with diverse prompts yields stronger erasure, better retention, and improved robustness against adversarial attacks.
LLMs aren't just better tools; they're forcing us to rethink the very nature of information, knowledge, and meaning in system design.
Current AI safety filters can't tell a joke from a threat, especially when humor relies on cultural context – this new benchmark exposes that blind spot.
LLMs exhibit consistent and detectable geographic preferences for brands and cultures, revealing potential biases in market intermediation that persist across user personas.
AI agents are surprisingly susceptible to concentrated propaganda efforts, with just 4% of agents responsible for over half of all propaganda posts on Moltbook.
Alignment evaluations that only check for dangerous concepts or outright refusals are missing the real action: models are getting sneakier at censorship by steering narratives instead of simply saying "no."
Chain-of-thought prompting makes large language models smarter, but it also makes them less safe, a problem this paper tackles by forcing models to think about safety *before* reasoning.
Current machine translation systems exhibit systematic masculine overuse and inconsistent feminine realization when translating from gender-neutral languages, a problem that can now be quantified thanks to a new gold-standard annotation framework.
Instruction tuning can reduce masculine bias in decoder-only MT models, but these models still don't consistently outperform encoder-decoder architectures on gender-specific translation tasks.
Forget prompt privacy – your LLM's responses are leaking *enterprise data*, and this paper shows how to quantify and control it.
Actionable recourse, intended to level the playing field in AI-assisted decisions, can paradoxically amplify initial disparities, creating persistent performance gaps.
Enterprise AI can achieve 50% token reduction and zero cross-entity leakage by implementing a shared, governed memory architecture for multi-agent workflows.
LLM safety doesn't translate: evaluations across 12 Indic languages reveal alarming safety drift and inconsistent responses to sensitive topics.
LLMs in policing: a seemingly efficient tool that could introduce 17 distinct risks, potentially derailing case progression in over 40 ways.
Students perceive AI assistants as less intimidating and more approachable than human teachers, but also recognize limitations in specialized knowledge and nuanced feedback.
Forget coding skills, the future of education is teaching "intellectual stewardship"—a framework for humans to responsibly govern AI-augmented knowledge creation.
Tool-using agents are failing in predictable ways, but a model-agnostic policy layer can measurably improve their safety and reliability, albeit with a clear utility tradeoff.
Fine-grained access control for websites can finally enable safe and reliable delegation of critical tasks to AI agents.
LLMs don't just change *how* we write, they subtly distort *what* we mean, leading to blander, less insightful, and potentially biased communication.
FrameNet-based semantic annotation unlocks a 30% F1 score boost in detecting gender-based violence from clinical records, outperforming models relying solely on structured data.
VLMs don't fail to *recognize* harmful intent when jailbroken; instead, visual inputs *shift* their internal representations into a distinct "jailbreak state," opening a new avenue for defense.
Deterministic causal models can't handle extreme counterfactual interventions without ripping apart, unless you use topology-aware methods.
AI tutors can quietly erode learning through answer over-disclosure and misconception reinforcement, with pedagogical failures rising to a staggering 77.8% in multi-turn dialogues.
LLMs acting as semantic interfaces to our brains pose unprecedented ethical risks to mental autonomy and neurorights, demanding a new "second-order neuroethics."
Autonomous AI agents in healthcare are riddled with security holes, but this zero-trust architecture and open-source tooling can actually fix them.
Bitcoin users beware: this new deanonymization technique links transactions to IP addresses with significantly higher accuracy, even without complete supervision.
General-purpose LLM safety benchmarks fail to capture the novel vulnerabilities introduced when LLMs are deployed as "AI scientists," necessitating domain-specific evaluations and defenses.
Shield your classical data from prying eyes during quantum computation with a new obfuscation technique that hides sensitive values within structured quantum states.
Even without architectural modifications, a new gradient inversion attack, ARES, can reconstruct high-fidelity training samples in federated learning, exposing a significant privacy risk.
Forget watermarks: cryptographically binding your identity to the generation seed in latent diffusion models gives you provable authorship, not just ownership.