Search papers, labs, and topics across Lattice.
Beijing Language and Culture University
3
0
4
Multi-turn jailbreak attacks can be thwarted with a training-free framework that reduces their success rate to as low as 0.2%, all while preserving model utility.
Explicit gender cues can systematically shift LLM decision-making, even as models deny their influence鈥攔evealing a hidden layer of bias in AI judgments.
LLMs don't just reflect gender bias in public vs. private spaces; they encode nuanced, micro-level mappings that substantially exceed real-world distributions, shaping spatial gender narratives in unexpected ways.