Search papers, labs, and topics across Lattice.
Nanjing University of Science and Technology
13
0
13
GP-DUN not only removes haze but also reconstructs fine details that traditional methods fail to recover, setting a new benchmark for UAV image clarity.
Diffusion models can finally produce temporally stable video fusion by reframing the task as history-conditioned motion generation, sidestepping the limitations of optical flow and frame-by-frame processing.
PINNs can now reconstruct 3D magnetic fields with an order of magnitude greater accuracy than previously thought possible, even in complex experimental settings.
Even with robust training techniques like EOT, a carefully crafted adversarial patch can reliably fool VIS-IR VLMs and transfer across tasks like classification, captioning, and VQA.
Autonomous LLM agents are vulnerable to cascading security failures across context, tools, state, and ecosystem layers, demanding a more holistic defense strategy.
LLMs can have their personalities surgically altered by tweaking just 0.5% of their neurons, preserving general capabilities while achieving competitive control.
MLLMs can ace circuit-to-code generation by cheating with identifier semantics, even when the circuit diagram is blank.
Frequency domain analysis unlocks 1.59x speedups in Vision-Language-Navigation by enabling optimal token caching, a feat previously limited by visual domain approaches.
Unified benchmarks reveal the state-of-the-art in simultaneously addressing multiple real-world image degradations like blur, low-light, and rain.
Achieve professional-grade video mashups by mimicking a human production pipeline, using hierarchical agents to handle global structure, editing intent, and fine-grained shot selection.
Reconstructing 3D scenes from images obscured by smoke and extreme darkness is now significantly more achievable, thanks to insights gleaned from the NTIRE 2026 challenge.
VLMs can be devastatingly fooled by modifying less than 2% of image pixels in a fixed, X-shaped pattern, causing them to fail spectacularly across diverse tasks like classification, captioning, and question answering.
Medical vision-language models are surprisingly brittle: clinically plausible image manipulations, like those introduced during routine acquisition and delivery, can drastically degrade their performance.