Search papers, labs, and topics across Lattice.
Department of Information Engineering and Computer Science, University of Trento, Trento, Italy, Fondazione Bruno Kessler, Trento, Italy
2
0
3
Predefined interaction vocabularies are holding back HOI detection, but MLLMs can unlock truly unconstrained understanding of how humans and objects interact.
Unlock human-interpretable video understanding without task-specific training: TF-SMOT leverages off-the-shelf vision-language models to achieve state-of-the-art semantic multi-object tracking.