Search papers, labs, and topics across Lattice.
1
0
3
On-policy RL (GRPO) makes LLMs significantly better at vulnerability detection than SFT or preference optimization, outperforming even strong zero-shot baselines.