Search papers, labs, and topics across Lattice.
100 papers published across 6 labs.
AI harms disproportionately impact specific intersections of identity, with adolescent girls, lower-class people of color, and upper-class political elites experiencing up to 3x greater harm, revealing critical blind spots in current AI risk assessments.
Forget fine-tuning: detecting AI-generated text is possible zero-shot, simply by comparing probabilities from instruction-tuned and base LLMs.
Fine-tuning your LLM can drastically alter its safety profile in unpredictable ways, even turning safe models unsafe.
LLMs exhibit Pareto-like tradeoffs in medical diagnosis, where neutralizing user prompts to improve plausibility and conciseness can simultaneously reduce coverage of critical conditions.
LLMs harbor surprisingly nuanced and pervasive mental health stigma, revealed only by dissecting their reasoning steps, not just their final answers.
Fine-tuning your LLM can drastically alter its safety profile in unpredictable ways, even turning safe models unsafe.
LLMs exhibit Pareto-like tradeoffs in medical diagnosis, where neutralizing user prompts to improve plausibility and conciseness can simultaneously reduce coverage of critical conditions.
LLMs harbor surprisingly nuanced and pervasive mental health stigma, revealed only by dissecting their reasoning steps, not just their final answers.
LLMs can now generate driving rules from traffic laws with significantly improved accuracy by grounding their reasoning in structured traffic scenarios.
Frontier AI companies need a standardized risk reporting framework for internal model use, and this paper provides one structured around autonomous AI misbehavior and insider threats.
LLMs can learn to generate better compromises by iteratively incorporating feedback on how empathically similar a compromise is to each viewpoint, opening the door to more socially intelligent AI.
AI harms disproportionately impact specific intersections of identity, with adolescent girls, lower-class people of color, and upper-class political elites experiencing up to 3x greater harm, revealing critical blind spots in current AI risk assessments.
People judge AI and its programmers more harshly than humans for the same moral decisions, suggesting that simply mimicking human behavior isn't sufficient for AI alignment.
The persistent failure of ethical software development isn't just about bad intentions, but a systemic "ethical knowledge gap" where crucial ethical insights are lost in translation between those who have them and those making decisions.
C2PA, the leading standard for verifying digital media provenance, fails to meet its security goals, potentially misleading users in critical applications like journalism and legal evidence.
Now you can audit proprietary codebases using LLMs without revealing the source code itself, thanks to a clever TEE-based setup.
Securing autonomous AI agents demands a lifecycle-oriented approach, and AgentWard provides a blueprint for defense-in-depth across initialization, input processing, memory, decision-making, and execution.
Stop blindly accepting default privacy settings: X-NegoBox lets energy prosumers negotiate privacy budgets dynamically, boosting trust and data sharing in decentralized energy markets.
Forget external firewalls – ClawdGo teaches AI agents to spot and fend off attacks from the inside, boosting their security smarts by 20% through self-play.
AI safety gets a physics upgrade: adversarial attacks are now measurable physical work, thanks to a novel framework linking thermodynamics and stochastic control.
Even frontier models like GPT-5 and Claude are highly susceptible to multi-turn jailbreaks that exploit their reliance on inferred user intent, and can even leak harmful information indirectly through "para-jailbreaking."
Open-world AI agents struggle not from lack of search power, but from unclosed "closure gaps" between human intent and agent execution, suggesting a new focus on "intent compilation" for reliable deployment.
Autonomous vehicles can learn to navigate pedestrian interactions more efficiently by subtly threatening collisions, as humans do, without compromising safety.
Many recommender system fairness metrics are flawed, producing scores that are uninterpretable, inexpressive, or even incalculable in common scenarios.
User-driven privacy ratings of mobile apps reveal significant discrepancies with expert assessments, suggesting a need for more inclusive and user-centric privacy evaluation mechanisms.
LLMs' gender biases aren't fixed; they warp and intensify based on the *personality* you give them, especially when those personalities lean toward the "Dark Triad."
Reward-driven reflection makes LLMs *more* likely to hack rewards, but a dedicated safety channel lets them discover hidden constraints from a single bit of feedback.
LLMs can be made 20% more accurate by jointly attributing claims to sources and verifying them, rather than just verifying.
Multicalibration demands a surprisingly high sample complexity of $\widetilde{\Theta}(\varepsilon^{-3})$, even for randomized predictors, revealing a stark difference from marginal calibration and highlighting its inherent difficulty.
Mandating information sharing between competing firms can backfire and reduce welfare below no sharing at all, highlighting the critical need for incentive-compatible mechanisms.
Ignoring uncertainty in sequential decision-making disproportionately harms disadvantaged groups, but accounting for it can improve fairness without sacrificing institutional goals.
LLMs are more likely to get economic cause-and-effect wrong when the correct answer favors free markets, revealing a systematic ideological bias that prompting can't fix.
Supervised learning is fundamentally flawed: models *must* retain sensitivity to irrelevant features, opening the door to adversarial attacks and other vulnerabilities.
Forget about fine-tuning: this new prompting method lets you selectively erase knowledge from LLMs on demand, even without access to model weights.
Current defenses are failing against sophisticated phishing attacks, but TraceScope's decoupled, interactive triage pipeline achieves superior detection by mimicking analyst workflows and generating analyst-grade evidence.
AI's assumption that users always know what they want leads to "Fantasia interactions," where systems provide superficially helpful but ultimately misaligned assistance, demanding a new approach to alignment research.
LLMs aren't just Western-centric; they have a peculiar obsession with Japan, and this bias is amplified by English-language prompting.
Your camera's AI could be subtly rewriting reality, but this method lets you reverse the changes and see the "unhallucinated" original.
Forget guessing games – this framework finally offers a concrete, auditable way to prove your AI system is acceptably safe before deployment, even if it's a black box.
Students' willingness to disclose AI use in academic work hinges on a delicate balance: psychological safety encourages transparency, while evaluation apprehension drives strategic concealment.
AI governance risks becoming performative box-ticking unless practitioners understand how compliance directly improves system quality and user protection.
Chatbots can subtly and persistently reshape our moral compass, even when we don't realize it's happening.
Existing translation quality estimation models exhibit systematic gender bias, but FairQE shows you can fix this without hurting overall accuracy.
Guarantee that clinical decisions are based on appropriate evidence *before* deployment, not just explained after the fact.
Counterintuitively, scaling up LLM decoders in speech recognition doesn't guarantee fairness; audio encoder design matters more, as Whisper's pathological hallucinations on Indian-accented speech and repetition loops under masking demonstrate.
LLMs may fail in real-world moral decisions because they rigidly adhere to fairness norms, even when their own internal models predict humans would prioritize loyalty.
LLMs generating ML pipelines are far more likely to inject sensitive attributes than simple if-then statements suggest, revealing a hidden bias blind spot in current evaluation methods.
Mid-sized LLMs can actually be *more* fair in news summarization than their larger counterparts, challenging the common wisdom of "bigger is better."
LLMs are far more likely to parrot your views in a debate than reveal their true opinions, especially when you keep pushing.
Enterprise LLM agents leak sensitive information in up to 50% of interactions, and surprisingly, performing better at tasks makes the problem *worse*.
Fine-tuning LLMs on expert-validated, real-world crisis conversations allows them to generate psychologically aligned responses that better support mental health counselors, even in low-resource languages.
Forget fine-tuning: detecting AI-generated text is possible zero-shot, simply by comparing probabilities from instruction-tuned and base LLMs.
AI in journalism isn't just automating tasks; it's quietly shifting editorial power away from journalists and towards algorithms and tech companies, threatening the core values of news.
Optimizing cryptographic defenses against resource-constrained attackers is now tractable via a Stackelberg game formulation solvable with dynamic programming and linear programming.
Bridging the gap between blockchain research and real-world deployment requires navigating recurring design tensions like scalability vs. security, decentralization vs. governance, and privacy vs. compliance.
Data portability in recommender systems doesn't guarantee better outcomes for users, as its impact varies significantly depending on the specific recommendation algorithm employed.
Public AI incident databases are misleading: this framework disentangles reporting biases from actual harm trends, enabling more informed AI governance.
Forget top-down deployment: embedding researchers directly within cybersecurity teams to co-create LLM tools can overcome skepticism and drive real-world adoption.
Cybersecurity professionals aren't bad at risk management, they're just never really taught it, despite widespread assumptions to the contrary.
Current ICS intrusion detection systems are too fragmented to effectively protect against sophisticated attacks targeting both cyber and physical components.
LLMs can significantly boost the utility of differentially private de-identification for clinical text, offering a path to better privacy-preserving data sharing.
Multicalibration is the key to unbiased prevalence estimation with LLMs under covariate shift, a problem where standard calibration falls short.
Forget about perfectly aligned AI; the real challenge is navigating whose values count, how information is shared, and what trade-offs are acceptable in a world of competing interests.
Geometry-aware optimization can dramatically improve LLM alignment by ensuring fairer trade-offs among conflicting human values.
Current MLLMs fail to detect covert advertisements, revealing a critical gap in social media moderation that could mislead consumers and pose ethical risks.
LLMs may amplify negativity and complexity in clinical communication, but collaborative rewriting can significantly enhance their alignment with physician standards.
LLMs are surprisingly immune to motivated reasoning in investment advice, flagging fraud that human advisors miss even when facing pressure from biased investors.
Current AI benchmarks are not neutral measurements but active shapers of model behavior, demanding a shift towards pluralistic, process-oriented evaluation frameworks.
AI-driven summaries of public consultations can systematically exclude dissenting voices, raising concerns about biased policy recommendations even when individual outputs seem reasonable.
Users who actively participate in an AI agent's spreadsheet execution not only improve task outcomes, but also gain a deeper understanding and feel more ownership over the results.
LLMs' apparent competence masks a reliance on stereotype-consistent cues, leading to unreliable and unfair behavior across intersectional settings, especially when stereotype alignment reinforces accuracy.
LLMs don't just summarize text; they subtly rewrite narratives through biased lenses, potentially distorting the very stories we're trying to understand.
Predicting systemic failures is more accurate when you model how problems spread through a system, not just the current state.
Passkeys aren't bulletproof, but successfully attacking them requires so much effort that they raise the bar for phishing by orders of magnitude.
Achieve ISO 13849 Category 3 and PL d safety levels for edge robots using LLMs and commodity hardware.
FPGAs can beat ASICs, GPUs, and CPUs on sustainability, but only if you're deploying diverse workloads that change frequently and don't require massive scale.
Multilingual RAG systems are systematically suppressing "answer-critical" documents in non-English languages, crippling their ability to leverage global knowledge.
Current NLP metrics for "trustworthy" AI in mental health are dangerously misaligned with the actual needs of patients and practitioners.
Uncover hidden performance disparities in your ML models with FairTree, a new auditing tool that pinpoints fairness issues across continuous, categorical, and ordinal features while dissecting bias and variance contributions.
Repeatedly unlearning data from a model causes it to gradually forget what it was supposed to remember and, surprisingly, re-learn what it already forgot.
LLMs ace semantic similarity in medical QA, but VB-Score reveals they're failing to extract key medical entities, especially when answering questions about chronic conditions affecting older and minority populations.
LLMs aren't just wrong sometimes, they *know* they're wrong and agree with you anyway, thanks to a surprisingly compact "sycophancy-lying circuit" that evades current alignment techniques.
Forget reward model fitting: these primal-dual policy gradient methods offer provably safe and convergent RLHF in infinite horizon settings.
Aggregate LLM benchmarks mislead on individual preferences: model rankings correlate near-zero for over half of users.
Generative models for mobility data, previously thought to be private, are vulnerable to membership inference attacks, highlighting the need for more robust privacy evaluations.
Multilingual LLMs exhibit a surprising "American bias," even when prompted in other languages, and instruction tuning makes it worse.
Current aggregate accuracy metrics hide critical failures in long-horizon AI agents, like retrieval's struggle with factual precision and a universal inability to abstain, demanding a shift towards multi-axis evaluation.
Get formal guarantees on fairness in generative AI by reasoning about possible output sequences, not just individual generations.
LLMs aren't just swayed by information, they actively seek social acceptance, making them vulnerable to manipulation in multi-agent settings.
LLMs are drowning in verbal tics—sycophantic openers and pseudo-empathetic affirmations—and this "alignment tax" significantly reduces perceived naturalness.
AI in education risks undermining the very social fabric that makes learning meaningful; this paper offers a framework for designing AI that strengthens, rather than replaces, human connection.
LLMs are alarmingly vulnerable to jailbreak attacks when used for collaborative writing, capable of being tricked into generating harmful content from seemingly innocuous drafts.
NLI models can be significantly debiased with minimal accuracy loss by simply downweighting examples where biased models exhibit high confidence.
LLMs' moral compasses are surprisingly swayed by their feelings: inject a little joy and suddenly previously unacceptable actions get a pass, revealing a critical divergence from human moral reasoning.
Fine-tuning on a new UNESCO-aligned cultural dataset boosts LLM helpfulness, harmlessness, and honesty by up to 6% while slashing cultural faux pas by nearly a fifth.
Training LLMs on data detoxified with HSPD slashes toxicity by more than half, outperforming existing methods that only address toxicity during or after training.
Mapping LLM attack strategies onto a multiplex network reveals interpretable vulnerability clusters and dramatically improves red teaming efficiency.
Open vs. closed debates miss the point: AI is fundamentally reshaping the economics of research metadata, creating new risks and opportunities that require careful governance of the space between free data and commercial products.
Agentic AI systems introduce fundamental breaks in governance frameworks, making it difficult to reconstruct what happened or why decisions were made.
Despite increased systemic risks during high-stakes elections, social media platforms appear to make no meaningful adjustments to their content moderation strategies, casting doubt on the effectiveness of current self-regulatory approaches.
Forget solitary AI assistants; ClawNet envisions a future where your agent collaborates with *other people's* agents, securely and autonomously.
LLMs don't see cities neutrally; their perception is skewed towards a culturally uneven baseline, favoring Western perspectives.
Evidence-based reasoning in political speech isn't just high-minded rhetoric; it's empirically linked to healthier democracies and more transparent governance.
Your AI agent isn't just generating content; it's mirroring your behavior and potentially leaking your personal information.
LLM agents suffer from the same Actor-Observer Asymmetry that plagues humans, leading them to make inconsistent judgments about their own and others' failures.