Xinyu Geng

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (4)Tool Use & Agents (4)Eval Frameworks & Benchmarks (2)Computer Vision (1)

Frequent co-authors

Yi R. Fung (2)Fan Zhang (1)Vireo Zhang (1)Shengju Qian (1)

Papers (4)

Jun 5, 2026

Fan Zhang +91w ago

Struct-Searcher: Agentic Structural Thinking Advances Multimodal Deep Information Seeking

Struct-Searcher achieves a remarkable 17.2% accuracy boost in multimodal information seeking by effectively managing conflicting evidence through a dynamic structural graph.

Fan Zhang, Vireo Zhang, Shengju Qian +7

Multimodal Models Tool Use & Agents

May 20, 2026

May 20, 2026·also NUS, Meitian

GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

Forget scalar rewards: GenEvolve distills structured visual experiences from successful and failed generation trajectories, enabling token-level supervision for self-improving image generation agents.

Sixiang Chen, Zhaohu Xing, Xinyu Geng +5

Computer Vision Multimodal Models Tool Use & Agents

Apr 5, 2026

Xinyu Geng +5Apr 5, 2026·also UPenn

GeoBrowse: A Geolocation Benchmark for Agentic Tool Use with Expert-Annotated Reasoning Traces

Current multimodal agents still struggle to combine ambiguous visual cues with open-web verification, highlighting a critical gap in their ability to perform complex geolocation tasks.

Xinyu Geng, Yanjing Xiao, Yuyang Zhang +3

Eval Frameworks & Benchmarks Multimodal Models Tool Use & Agents

Feb 26, 2026

Feb 26, 2026·also HKUST, Qian Xuesen Laboratory of Space Technology, Soochow, UESTC +1

AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios

Even the best multimodal agents struggle with realistic visual scenarios, achieving only 27% accuracy on the new AgentVista benchmark that demands long-horizon tool use across web search, image search, and code.

Zhaochen Su, Jincheng Gao, Hangyu Guo +12

Eval Frameworks & Benchmarks Multimodal Models Tool Use & Agents