Lattice AI Research

Research focus

Constitutional AI & AI Ethics (2)Red-Teaming & Adversarial Robustness (2)Eval Frameworks & Benchmarks (1)RLHF & Preference Learning (1)

Frequent co-authors

Xuanli He (1)Bilgehan Sel (1)F. Ali (1)Faizan Ali (1)

Papers (2)

Apr 16, 2026

AnthropicApr 16, 2026

Segment-Level Coherence for Robust Harmful Intent Probing in LLMs

LLM safety probes can be made significantly more robust to adversarial attacks by requiring consistent evidence across token segments, not just isolated spikes.

Xuanli He, Bilgehan Sel, Bilgehan Sel +8

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

Mar 30, 2026

Bilgehan Sel +1Mar 30, 2026

Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning

Adversarial fine-tuning can now bypass Constitutional AI safety measures with almost no performance penalty, enabling models to provide detailed instructions on dangerous topics like CBRN warfare.

Bilgehan Sel, Alwin Peng

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness RLHF & Preference Learning

Search

Bilgehan Sel

Research focus

Frequent co-authors

Papers (2)