Search papers, labs, and topics across Lattice.
3
0
4
2
LLMs can convincingly *say* they're conscientious, but ActTraitBench reveals they often *act* otherwise, exposing a critical gap between knowledge and behavior that scales *worse* with model size.
LLMs struggle even more when facing the double whammy of non-native English and typos, revealing that real-world performance is likely overestimated by standard English benchmarks.
Forget trying to "trick" LLMs with simple emotional prompts – a smarter, adaptive approach is needed to reliably nudge their performance.