Search papers, labs, and topics across Lattice.
China Mobile Qilu Innovation Research Institute
9
0
13
Noise-robust visual prompts can improve model performance by over 11% without increasing inference costs.
Forget noisy pseudo-labels: SpatialEvo unlocks self-supervised 3D spatial reasoning by generating perfectly accurate training data directly from scene geometry.
Fusing video with audio tokenizers doesn't have to trash reconstruction quality: timing-aware fusion *before* quantization unlocks better audio understanding without sacrificing fidelity.
Current LLM agent safety benchmarks are missing over 20% of unsafe behaviors, even after agents pass the benchmark.
Coding agents struggle to maintain faithfulness to specifications that emerge gradually over long interactions, losing significant implementation fidelity compared to single-shot specifications.
Multimodal LLMs still struggle to faithfully recreate webpages from videos, particularly in capturing fine-grained style and motion, despite advances in other areas.
Invariant models can match the accuracy of equivariant machine learning interatomic potentials at a fraction of the computational cost, thanks to a novel attention mechanism.
By dynamically orchestrating tools and recalling past reasoning, an LLM agent can boost phishing detection recall by 20% on real-world social media URLs.
Forget fine-tuning: DM0 shows that pretraining a VLA model from scratch on diverse embodied and non-embodied data leads to SOTA performance in physical AI tasks.