Search papers, labs, and topics across Lattice.
Samsung AI Center
3
0
4
The supposed stability of archetypal SAEs evaporates when initialization is randomized, challenging the reliability of their concept extraction claims.
LLMs are not just generating random names; they create persistent, correlated character ensembles that are infiltrating academic publishing and could undermine scholarly integrity.
Forget white-box access: this grey-box method recovers verbatim memorized content from finetuned LLMs by just comparing output logits, even revealing hidden data pipeline artifacts.