Search papers, labs, and topics across Lattice.
5
0
13
6
Self-play can be dramatically improved by exploiting the "question construction path" it generates as privileged information for self-distillation, leading to 2-3x faster learning.
Current VLMs, despite excelling at general reasoning, still fail to accurately identify food and estimate nutrition, even when given multiple views and chain-of-thought prompting.
Forget noisy, biased LLM evaluators: CDRRM distills preference insights into compact rubrics, letting a frozen judge model leapfrog fully fine-tuned baselines with just 3k training samples.
Predict how well your LLM will transfer to a new domain *before* fine-tuning, by using sparse autoencoders to spot tell-tale signs of domain shift in the model's representations.
Forget static agent communication graphs: AgentConductor uses RL to dynamically rewire agent interactions based on task difficulty, slashing token costs by up to 68% while boosting code generation accuracy.