Search papers, labs, and topics across Lattice.
Nanjing University
8
0
12
UniDDT achieves a groundbreaking balance between multimodal understanding and generation, outperforming existing models in both tasks with enhanced semantic coherence.
Incorporating direction-of-arrival information, GC-Dec-IVA significantly enhances source separation in distributed microphone arrays, overcoming critical limitations of previous methods.
LLMs are evolving from reactive chatbots to proactive digital colleagues, fundamentally changing how AI can assist in complex tasks.
Aggressive pursuit strategies can yield nearly 50% more thrust for quadcopters by relaxing traditional visibility constraints during interception.
Rapidly prototype and benchmark robotic navigation scenarios using simple YAML configurations, eliminating the coding barrier in simulation.
Foley-Omni achieves expert-level performance in audio synthesis while generating cohesive soundtracks for video, enhancing both intelligibility and quality.
Ditch the VAE bottleneck: Representation Forcing lets you train unified multimodal models to generate high-quality images directly from pixels, rivaling VAE-based approaches without the architectural constraint.
Over-reliance on agentic decomposition can actually *hurt* audio understanding when a strong audio frontend already provides sufficient information, highlighting the importance of conditional evidence acquisition.