Search papers, labs, and topics across Lattice.
2
0
3
0
Translation metrics can exhibit significant cross-lingual scoring bias, meaning they unfairly penalize or reward translations depending on the language, even when the quality is the same.
Human-like evaluation of long-form generative AI is now possible, thanks to a new framework that breaks down reference answers into weighted, context-aware scoring points.