Search papers, labs, and topics across Lattice.
1
0
3
Skywork-V2-8B, a leading open-weight reward model, is shown to mistakenly favor responses with redundant spacing and hallucinated content, revealing critical vulnerabilities in current RM training.