Search papers, labs, and topics across Lattice.
University of Texas at San Antonio
2
0
4
LLMs ace semantic similarity in medical QA, but VB-Score reveals they're failing to extract key medical entities, especially when answering questions about chronic conditions affecting older and minority populations.
Current XAI evaluations can be fooled: this new metric reveals that even small input variations can cause explanations to drastically change, undermining trust in pattern recognition systems.