Search papers, labs, and topics across Lattice.
Central South University
15
0
16
Current MLLMs are still surprisingly reliant on textual reasoning, even when visual information is crucial for solving STEM problems.
Existing video datasets fail to capture the complexity of human interactions in diverse scenes, but OmniHuman offers a new benchmark to train and evaluate models on more realistic human-centric video generation.
Over 20 teams vied to decode human attention in video, revealing new insights into saliency prediction techniques.
Achieve an 8x speedup in chest X-ray report generation without sacrificing clinical accuracy by distilling multi-step diffusion into a single, efficient step.
Telecom World Models fuse the flexibility of LLMs with the fidelity of Digital Twins, enabling uncertainty-aware predictive planning that existing approaches can't match.
Chorus unlocks 45% speedups in video diffusion inference by cleverly reusing computations across user requests, even in highly optimized 4-step models where traditional caching fails.
Finally, a unified framework lets you control both facial appearance and voice timbre for personalized audio-video generation across multiple identities.
Software traceability research is severely imbalanced, with code-related links dominating and 95% of tools stuck in academia.
Forget treating document graphics as mere pixels: this new OCR system parses them into reusable code, unlocking multimodal supervision and outperforming existing systems.
Forget MACs and parameters: accurately predict DL model energy and latency on MCUs with 3x and 6.5x lower error using just clock cycles.
By dynamically orchestrating tools and recalling past reasoning, an LLM agent can boost phishing detection recall by 20% on real-world social media URLs.
Reconstructing surgical scenes from monocular endoscope videos with large camera motion just got a whole lot better, thanks to a new window-based approach that doesn't need stereo depth or perfect camera tracking.
LLMs can now reason effectively about complex agricultural scenarios by iteratively writing and executing code within a specialized environment, outperforming traditional text-based approaches.
Forget fine-tuning: DM0 shows that pretraining a VLA model from scratch on diverse embodied and non-embodied data leads to SOTA performance in physical AI tasks.
LLMs can be pruned 4x faster without sacrificing performance thanks to a new gradient-based metric and projection compensation technique.