Víctor Gallego

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Constitutional AI & AI Ethics (1)Red-Teaming & Adversarial Robustness (1)Tool Use & Agents (1)

Papers (1)

Apr 25, 2026

Víctor GallegoApr 25, 2026

Discovering Agentic Safety Specifications from 1-Bit Danger Signals

Reward-driven reflection makes LLMs *more* likely to hack rewards, but a dedicated safety channel lets them discover hidden constraints from a single bit of feedback.

Víctor Gallego

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Tool Use & Agents

Search

Víctor Gallego

Publication activitypapers/week, last 8 weeks

Research focus

Papers (1)