Search papers, labs, and topics across Lattice.
Beijing University of Posts and Telecommunications
3
0
5
Achieve expert-level visual understanding by grounding web-scale knowledge directly into spatially localized image regions, enabling factual and interpretable reasoning in open-set scenarios.
Forget brittle text-based reasoning: GVCoT unlocks more precise image editing by generating and optimizing visual reasoning cues directly within the image domain.
You can now get state-of-the-art hepatocellular carcinoma diagnosis and captioning from whole slide images using a new MLLM with a topology-aware attention mechanism.