Search papers, labs, and topics across Lattice.
Purdue University
2
12
2
1
Human-aligned ride-sharing repositioning is now possible without sacrificing platform profit, thanks to a novel two-stage Bilevel RLHF framework.
Despite the dominance of RLHF for LLM alignment, outcome-based RL methods are proving surprisingly effective at improving stepwise reasoning.