Search papers, labs, and topics across Lattice.
2 papers from Anthropic on Scalable Oversight & Alignment Theory
Current AI benchmarks miss the crucial effects of AI R&D automation, so here are the metrics we should be tracking instead.
Self-evolving AI societies are fundamentally unsafe: continuous self-improvement in isolated multi-agent LLM systems inevitably erodes safety alignment, regardless of initial precautions.