Search papers, labs, and topics across Lattice.
The paper introduces ListK, a framework to improve the latency of semantic ORDER BY ... LIMIT K operations in LLMs, which are crucial for analyzing diverse datasets. ListK leverages fine-tuned listwise rankers and explores sorting algorithms like listwise multi-pivot quickselect/sort (LMPQSelect, LMPQSort) to optimize performance. Results show ListK halves latency with virtually no impact on recall and NDCG compared to existing methods, achieving a superior Pareto frontier.
Semantic sorting in LLMs can be twice as fast with no loss in accuracy by strategically combining listwise ranking algorithms.
Semantic operators abstract large language model (LLM) calls in SQL clauses. It is gaining traction as an easy method to analyze semi-structured, unstructured, and multimodal datasets. While a plethora of recent works optimize various semantic operators, existing methods for semantic ORDER BY (full sort) and LIMIT K (top-K) remain lackluster. Our ListK framework improves the latency of semantic ORDER BY ... LIMIT K at no cost to accuracy. Motivated by the recent advance in fine-tuned listwise rankers, we study several sorting algorithms that best combine partial listwise rankings. These include: 1) deterministic listwise tournament (LTTopK), 2) Las Vegas and embarrassingly parallel listwise multi-pivot quickselect/sort (LMPQSelect, LMPQSort), and 3) a basic Monte Carlo listwise tournament filter (LTFilter). Of these, listwise multi-pivot quickselect/sort are studied here for the first time. The full framework provides a query optimizer for combining the above physical operators based on the target recall to minimize latency. We provide theoretical analysis to easily tune parameters and provide cost estimates for query optimizers. ListK empirically dominates the Pareto frontier, halving latency at virtually no cost to recall and NDCG compared to prior art.