Fumin Shen

×10−44\times 10^{-4}, with 30k and 20k training iterations, respectively. The batch size is set to 8, and AdamW [32] is adopted for optimization. Image resolutions follow prior works [51, 13]. Table 2: Comparison of zero-shot performance with other state-of-the-art methods on the Pascal-Part-116 and ADE

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Computer Vision (2)Architecture Design (Transformers, SSMs, MoE) (1)Multimodal Models (1)Data Curation & Synthetic Data (1)

Frequent co-authors

Gensheng Pei (2)Yazhou Yao (2)Jianjian Yin (1)Tao Chen (1)

Papers (2)

Mar 18, 2026

2w ago·also HKU

PCA-Seg: Revisiting Cost Aggregation for Open-Vocabulary Semantic and Part Segmentation

By disentangling semantic and contextual cues in vision-language models, PCA-Seg achieves state-of-the-art open-vocabulary segmentation with only 0.35M additional parameters per block.

Jianjian Yin, Tao Chen, Yi Chen +4

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Mar 17, 2026

Xinhao Cai +52w ago

Iris: Bringing Real-World Priors into Diffusion Model for Monocular Depth Estimation

By injecting real-world priors into a diffusion model, Iris achieves state-of-the-art monocular depth estimation with significantly improved generalization and detail, even with limited training data.

Xinhao Cai, Gensheng Pei, Zeren Sun +3

Computer Vision Data Curation & Synthetic Data

Search

Fumin Shen

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (2)