Fondazione Bruno KesslerTrentoApr 15, 2026arXiv:2604.14069

Towards Unconstrained Human-Object Interaction

Francesco Tonini, Alessandro Conti, Lorenzo Vaquero, Cigdem Beyan, Elisa Ricci

AI Summary

This paper introduces Unconstrained Human-Object Interaction (U-HOI), a novel HOI detection task that removes the constraint of predefined interaction vocabularies by leveraging Multimodal Large Language Models (MLLMs). They propose a pipeline involving test-time inference and language-to-graph conversion to extract structured interactions from free-form text generated by MLLMs. Experiments evaluating various MLLMs on U-HOI reveal the limitations of existing HOI detectors and demonstrate the potential of MLLMs for more flexible interaction recognition.

Key Contribution

Predefined interaction vocabularies are holding back HOI detection, but MLLMs can unlock truly unconstrained understanding of how humans and objects interact.

Abstract

Human-Object Interaction (HOI) detection is a longstanding computer vision problem concerned with predicting the interaction between humans and objects. Current HOI models rely on a vocabulary of interactions at training and inference time, limiting their applicability to static environments. With the advent of Multimodal Large Language Models (MLLMs), it has become feasible to explore more flexible paradigms for interaction recognition. In this work, we revisit HOI detection through the lens of MLLMs and apply them to in-the-wild HOI detection. We define the Unconstrained HOI (U-HOI) task, a novel HOI domain that removes the requirement for a predefined list of interactions at both training and inference. We evaluate a range of MLLMs on this setting and introduce a pipeline that includes test-time inference and language-to-graph conversion to extract structured interactions from free-form text. Our findings highlight the limitations of current HOI detectors and the value of MLLMs for U-HOI. Code will be available at https://github.com/francescotonini/anyhoi

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Towards Unconstrained Human-Object Interaction

Related Papers