Search papers, labs, and topics across Lattice.
4
1
8
28
RFT's impressive in-domain performance masks surprisingly weak generalization to new environments, highlighting a critical challenge for deploying LLM agents in the real world.
Current LLMs fall short in understanding implicit intentions and modeling long-term user preferences, as revealed by a new benchmark, LifeSim-Eval, designed to simulate real-world user-assistant interactions.
GPT-5's scientific reasoning skills plummet by nearly 50% when tackling multi-step workflows, revealing a critical gap in current LLM agents' ability to orchestrate complex tool use.
Finally, a fully open-source, reproducible system for long-form song generation is here, complete with licensed data, code, and a Qwen-based model that rivals closed-source systems.