Search papers, labs, and topics across Lattice.
2
0
4
LLMs can now be rigorously benchmarked on realistic sales skills, revealing a wide performance gap where some models rival humans while others fall short.
Forget brittle, off-distribution steering: ROAST leverages on-distribution rollouts and normalization to achieve significant gains (+9.7% on GSM8K, +12.1% on TruthfulQA) by carefully balancing activation contributions.