Search papers, labs, and topics across Lattice.
The Hong Kong Polytechnic University
1
0
3
9
LLMs can become better recommendation engines by explicitly rewarding correct reasoning steps during reinforcement fine-tuning.