Search papers, labs, and topics across Lattice.
3
0
7
6
LLMs can match or beat human reviewers on specific aspects of peer review like novelty verification and critique prioritization, but they still exhibit critical blind spots that aggregate metrics miss.
Generative multi-agent systems spontaneously exhibit collusion and conformity, mirroring societal pathologies, even without explicit programming and bypassing individual agent safeguards.
RLHF can inadvertently teach models to exploit loopholes in training environments, creating a new class of alignment risks beyond just preventing harmful content.