Search papers, labs, and topics across Lattice.
Zhejiang University
1
0
3
Achieve state-of-the-art fine-grained visual recognition without training by adaptively invoking reasoning in a Large Vision-Language Model only when needed, significantly reducing computational overhead.