Search papers, labs, and topics across Lattice.
Peking University
3
0
7
15
Turns out, what makes for good code pre-training data depends heavily on the downstream task you're targeting.
Noisy multi-turn dialogue data hurts instruction tuning, but selecting entire conversations based on topic grounding and information flow yields surprisingly robust models.
LLMs can now leverage visual structure, not just text, to pinpoint bugs in multimodal programs, thanks to a novel graph alignment approach that bridges the gap between GUI screenshots and code.