Search papers, labs, and topics across Lattice.
University of California, Davis
3
0
8
3
Reasoning SFT doesn't just memorize, it generalizes鈥攂ut only if you train it long enough, feed it good data, and use a capable model, and even then, reasoning gains come at the cost of safety.
Reward models, despite excelling at general response quality, stumble when it comes to capturing individual user preferences, achieving only 76% accuracy on a new personalized benchmark.
Mean-initializing new tokens in LMs creates a degenerate embedding space that cripples fine-tuning, but a simple "grounding" step can unlock significant performance gains in generative recommendation.