Search papers, labs, and topics across Lattice.
This paper introduces Semantic Recall, a new metric for evaluating approximate nearest neighbor (ANN) search that focuses on retrieving semantically relevant objects theoretically retrievable via exact nearest neighbor search. Semantic Recall avoids penalizing algorithms for failing to retrieve semantically irrelevant nearest neighbors, addressing a limitation of traditional recall metrics. The authors also propose Tolerant Recall, a proxy for Semantic Recall applicable when identifying semantically relevant objects is infeasible, and demonstrate empirically that optimizing for these new metrics improves cost-quality tradeoffs in ANN search.
Stop penalizing your ANN search algorithms for failing to retrieve irrelevant neighbors – Semantic Recall offers a more nuanced and effective way to measure retrieval quality.
We introduce Semantic Recall, a novel metric to assess the quality of approximate nearest neighbor search algorithms by considering only semantically relevant objects that are theoretically retrievable via exact nearest neighbor search. Unlike traditional recall, semantic recall does not penalize algorithms for failing to retrieve objects that are semantically irrelevant to the query, even if those objects are among their nearest neighbors. We demonstrate that semantic recall is particularly useful for assessing retrieval quality on queries that have few relevant results among their nearest neighbors-a scenario we uncover to be common within embedding datasets. Additionally, we introduce Tolerant Recall, a proxy metric that approximates semantic recall when semantically relevant objects cannot be identified. We empirically show that our metrics are more effective indicators of retrieval quality, and that optimizing search algorithms for these metrics can lead to improved cost-quality tradeoffs.