DAMONankai Univer- sityNortheasternMay 25, 2026arXiv:2605.25514

From Item-Only to Query-Item: Query-Conditioned Generative Search with QGS in Quark

Shuo Meng, Jin Zhang, Bin Wang, Guanjun Jiang

AI Summary

This paper introduces Query-Conditioned Generative Search (QGS) to address the challenge of applying generative sequence models to search ranking, where query switches introduce semantic discontinuities. QGS encodes interactions as (query, item) pairs and trains with a query-conditioned next-item objective, effectively removing noisy supervision caused by mixing different query intents. To handle long interaction histories and maintain online latency, they propose a Linear HSTU encoder, reducing complexity from O(L^2) to O(L) without sacrificing ranking quality, and further incorporate hand-crafted features using HFG-Attention.

Key Contribution

By explicitly conditioning on the query, QGS achieves a 0.62% CTR increase in a major commercial search engine, proving that generative models can beat traditional deep learning baselines in search ranking when query context is properly handled.

Abstract

Generative sequence models have shown strong results in recommendation. Applying them to search ranking is more challenging. Search behavior is inherently query-driven. Each query switch introduces a sharp topic shift in the user's interaction history. Existing generative methods flatten queries and items into a single token sequence. They do not distinguish query boundaries. This causes the model to mix different query intents into one prediction target, resulting in noisy supervision. We present Query-Conditioned Generative Search (QGS). QGS encodes each interaction as a (query, item) pair token. It trains with a query-conditioned next-item objective. The prediction target changes from a noisy marginal P(item_{t+1}|context_{<=t}) to a clean conditional P(item_{t+1}|context_{<=t}, query_{t+1}). This directly removes the semantic discontinuity caused by query switches. Encoding long interaction histories with standard attention has quadratic cost. This is impractical under strict online latency budgets. We introduce a Linear HSTU encoder. It replaces full attention with causal linear recurrence. Per-layer complexity drops from O(L^2) to O(L) with no loss in ranking quality. Traditional search ranking depends on hand-crafted features like text-matching scores, statistical signals, and behavioral features. We propose HFG-Attention to preserve them in the generative framework. It organizes heterogeneous features into semantic groups and fuses them through a dedicated attention block. This bridges sparse engineered signals with dense sequential representations. QGS is deployed in the ranking module of Quark Search, a major commercial search engine in China. Online A/B tests show statistically significant gains: +0.62% CTR, +0.38% Click-Search Ratio, and +3.55% PV Duration over the production deep learning baseline.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...