Search papers, labs, and topics across Lattice.
2
0
4
ZPPO reveals that embedding teacher responses in prompts rather than gradients can dramatically boost the performance of small student models on challenging tasks.
Fine-tuning on the new ProCUA-SFT dataset boosts UI-TARS 7B's performance from a dismal 8-10% to an impressive 45.0% on OSWorld tasks, highlighting the critical role of high-quality training data.