Search papers, labs, and topics across Lattice.
AI Laboratory
2
0
6
SentGuard detects 90.5% of unsafe content within two sentences, revolutionizing real-time moderation for large language models.
LLMs exhibit a surprising degree of moral indifference, compressing distinct moral concepts into uniform probability distributions, a problem that persists across model scales, architectures, and alignment techniques.