Search papers, labs, and topics across Lattice.
1
0
2
Many apparent failures of LLMs on QA benchmarks may actually stem from flaws in the *questions* themselves, with up to 50% of questions being "underspecified."