Search papers, labs, and topics across Lattice.
3
3
6
6
Robots can now learn dexterous manipulation skills across different hand designs, thanks to a new Transformer architecture that treats actions as a flexible arrangement of joint movements, rather than a fixed sequence.
RLHF can be significantly improved for complex tasks by explicitly modeling preference relationships both within and between training examples, unlocking better instruction following without relying on expensive human annotation or biased LLM-generated data.
RLHF reward models can be made significantly less susceptible to length bias by explicitly modeling and disentangling semantic preferences from length requirements.