Search papers, labs, and topics across Lattice.
Sapienza University of Rome
1
0
3
1
NarrativeQA's reign as the go-to benchmark for long-document QA is over: LiteraryQA, a meticulously curated subset, reveals that LLM-as-a-Judge metrics align with human judgment far better than traditional n-gram methods.