Search papers, labs, and topics across Lattice.
1
0
2
LLM benchmark scores can swing wildly with simple synonym swaps, revealing a surprising reliance on superficial lexical cues rather than deeper understanding.