Search papers, labs, and topics across Lattice.
Shanghai Jiao Tong University
4
0
7
Current audio editing models are failing spectacularly, with an Exact Match Rate below 5% in complex tasks, exposing a critical need for improvement.
Real-time audio interaction is now possible with a unified model that not only performs traditional tasks but also proactively responds to audio stimuli.
Text-centric agentic search is out: Deep-Reporter shows how to build multimodal agents that leverage both text and visuals for grounded long-form generation.
Real-world proactive agents can now infer latent user needs and act on them in real-time, rivaling state-of-the-art models in intent detection while maintaining low latency.