Search papers, labs, and topics across Lattice.
Shenzhen University
9
0
11
Transformer-based architectures can now outperform CNNs in multi-view crowd tracking, especially in large, complex real-world scenes, thanks to a novel view-ground interaction mechanism.
Android API research is built on shaky ground: official API lists are surprisingly inconsistent, leading to potentially flawed conclusions.
Forget pixel-level noise: FogFool shows that physically-plausible, atmospherically-modeled fog can achieve 84% transfer attack success rate against remote sensing image classifiers, even surviving JPEG compression.
Edit 3D assets with text prompts while actually preserving the original object's unchanged parts, thanks to a new masking strategy and training dataset.
Forget hand-crafted templates: DUET learns to generate user and item profiles jointly, boosting recommendation accuracy by better aligning textual representations.
Multi-turn reinforcement learning gets a boost: weighting trajectories by semantic similarity dramatically improves baseline estimation and agent performance in long-document visual QA.
Finetuning visual foundation models with LoRA-based pairwise training dramatically improves AIGI detection robustness against real-world distortions.
Even state-of-the-art AI-generated image detectors struggle when images are cropped, resized, or compressed, revealing a critical gap in real-world robustness.
LLM agents can achieve 3x faster web search and higher accuracy by dynamically routing between multiple context management strategies.