Search papers, labs, and topics across Lattice.
This paper introduces QDET, a production system for query-driven event timeline summarization deployed on Baidu Search, which focuses on extracting and organizing query-relevant sub-events from large document sets. QDET employs multi-task supervised fine-tuning with temporal ordering, causal judgment, and timeline completion tasks, alongside reinforcement learning for concise summarization under strict length constraints. The resulting 7B parameter model achieves comparable or superior performance to a 671B parameter model while using only 1% of the parameters, and improves user engagement metrics in online A/B tests.
A 7B parameter model, optimized with multi-task learning and RL, rivals the timeline summarization performance of a 671B parameter model, proving that task-specific fine-tuning can dramatically shrink model size without sacrificing quality.
Understanding how events evolve over time is essential for search engines handling queries about trending news. We present QDET (Query-Driven Event Timeline Summarization), a production system deployed on Baidu Search that constructs focused event timelines to explain specific query events. Unlike traditional topic-centric approaches that aim for comprehensive coverage, QDET identifies and organizes sub-events closely relevant to the query from noisy candidate sets formed by millions of documents retrieved daily. QDET incorporates two key innovations: (1) multi-task supervised fine-tuning with three auxiliary tasks-temporal ordering, causal judgment, and timeline completion-that enable compact models to match the performance of much larger general-purpose models in specialized domains; (2) reinforcement learning-based event concise summarization that enforces strict length constraints while maintaining semantic quality, achieving 88.2% length compliance and outperforming 671B-scale models by 7.7 points in constraint satisfaction. Our fine-tuned 7B parameter model achieves 76.2% F1 score on timeline summarization, slightly surpassing the zero-shot performance of DeepSeek-R1-671B (76.1% F1) while using only 1% of its parameters-demonstrating that domain-specific optimization enables production-ready models with comparable quality at drastically reduced computational costs. Online A/B tests on Baidu Search validate real-world effectiveness, showing 5.5% CTR improvement, 4.6% longer dwell time, and 4.4% deeper exploration compared to single-task baselines. We further demonstrate that timeline understanding transfers to heat prediction, confirming effective knowledge transfer to downstream tasks.