Search papers, labs, and topics across Lattice.
University of Aberdeen, UK
2
0
4
Forget reward hacking and entropy collapse: multi-reward RLIF, combining answer-level and completion-level signals, unlocks stable and robust LLM reasoning without human supervision.
By weighting client updates based on validation gradient norms, FedVG offers a simple yet effective way to mitigate client drift in federated learning, outperforming volume-based aggregation strategies.