Search papers, labs, and topics across Lattice.
2
0
5
3
Forget disjointed pipelines and structured inputs: PlanAudio uses an LLM and semantic latent chain-of-thought to directly synthesize unified audio from free-form text prompts.
By jointly training a keyframe sampler with an MLLM, MSJoE achieves state-of-the-art accuracy in long-form video understanding while significantly reducing computational cost.