Search papers, labs, and topics across Lattice.
Baidu Inc.
2
0
4
A 7B parameter model, optimized with multi-task learning and RL, rivals the timeline summarization performance of a 671B parameter model, proving that task-specific fine-tuning can dramatically shrink model size without sacrificing quality.
The landscape of deep learning optimizers is vast, but this paper cuts through the noise to reveal the fundamental trade-offs and promising future directions for efficient, robust, and trustworthy training.