Search papers, labs, and topics across Lattice.
Harbin Institute of Technology, Pengcheng Laboratory
2
0
4
Treating geometry as a fundamental representational prerequisite, rather than a late-fusion auxiliary signal, significantly boosts spatio-temporal reasoning in vision-language models.
By explicitly verifying the visual existence of spoken references before segmentation, APRVOS substantially improves robustness in noisy audio-conditioned Ref-VOS, outperforming standard pipelines.