Search papers, labs, and topics across Lattice.
2
0
5
Fine-tuning on the new ProCUA-SFT dataset boosts UI-TARS 7B's performance from a dismal 8-10% to an impressive 45.0% on OSWorld tasks, highlighting the critical role of high-quality training data.
Multimodal models can now achieve state-of-the-art performance in real-world tasks like document understanding and audio-video comprehension with significantly reduced inference latency thanks to novel token-reduction techniques.