Search papers, labs, and topics across Lattice.
University of Minnesota
1
0
3
LLM judges in healthcare show promise for scalable evaluation, but their reliability swings wildly across tasks, demanding careful design and validation before trusting their verdicts.