Search papers, labs, and topics across Lattice.
1
0
3
LLMs exhibit a surprising "moral indifference," failing to internally distinguish between opposed moral concepts regardless of size, architecture, or alignment, but this can be partially remedied by representational alignment using sparse autoencoders.