Zhiqi Li

Untangling multilayer networks just got easier: T-GINEE uses tensors to explicitly model cross-layer dependencies, outperforming methods that treat layers independently.

Maolin Wang, Ziting Mai, Xuhui Chen +7

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

May 26, 2026

NVIDIAMay 26, 2026

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Ditch slow, token-by-token box generation: LocateAnything's Parallel Box Decoding (PBD) boosts VLM grounding speed and accuracy by decoding entire bounding boxes at once.

Shihao Wang, Shilong Liu, Yu Kuang +11

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Mar 5, 2026

NVIDIAMar 5, 2026·also AgiBot, Shanghai AI Lab

Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline

Current multimodal LLMs choke on long-form video understanding, either forgetting details or getting lost in the timeline, but a new agentic architecture with dynamic memory management offers a promising fix.

Guo Chen, Lidong Lu, Yicheng Liu +19

Eval Frameworks & Benchmarks Multimodal Models Tool Use & Agents

Search

Zhiqi Li

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (4)