Search papers, labs, and topics across Lattice.
3
0
5
6
Standard LLM benchmarks miss the mark: personalized "vibe-testing" reveals that user-specific prompts and subjective criteria can flip model rankings.
LLMs possess domain-specific "gut feelings" about their factual knowledge accuracy that external observers can't see, but this advantage vanishes for math reasoning.
Protein language models detect approximate sequence repeats using a two-stage mechanism where induction heads attend to aligned tokens, functionally subsuming the detection of exact repeats.