Search papers, labs, and topics across Lattice.
3
0
5
1
Despite achieving comparable overall scores, top-performing medical LLMs exhibit surprising differences in reasoning, evidence use, and longitudinal follow-up when evaluated on a new Chinese medical benchmark, revealing critical gaps in clinically actionable treatment planning.
Failure-driven post-training, combined with a meticulously curated 10M token STEM dataset, unlocks a 4.68% performance boost in LLM reasoning, proving that strategic data synthesis around model weaknesses is a powerful path to improvement.
An open-source ecosystem for agentic learning, complete with a trained agent and novel policy optimization, promises to accelerate research by providing a standardized, scalable platform.