Apr 1, 2026arXiv:2604.00592

HarassGuard: Detecting Harassment Behaviors in Social Virtual Reality with Vision-Language Models

Junhee Lee, Minseok Kim, Hwanjo Heo, Hwanjo Heo, S. Woo, Seungwon Woo

AI Summary

This paper introduces HarassGuard, a vision-language model (VLM) designed to proactively detect physical harassment in social VR using only visual data, addressing limitations of reactive or privacy-invasive methods. They fine-tuned VLMs on a newly constructed, IRB-approved harassment vision dataset using prompt engineering to incorporate contextual information. Results show HarassGuard achieves up to 88.09% accuracy in binary and 68.85% in multi-class harassment detection, matching state-of-the-art baselines with significantly fewer fine-tuning samples.

Key Contribution

VLMs can detect harassment in VR with comparable accuracy to prior methods, but with far fewer training examples and without relying on sensitive biometric data.

Abstract

Social Virtual Reality (VR) platforms provide immersive social experiences but also expose users to serious risks of online harassment. Existing safety measures are largely reactive, while proactive solutions that detect harassment behavior during an incident often depend on sensitive biometric data, raising privacy concerns. In this paper, we present HarassGuard, a vision-language model (VLM) based system that detects physical harassment in social VR using only visual input. We construct an IRB-approved harassment vision dataset, apply prompt engineering, and fine-tune VLMs to detect harassment behavior by considering contextual information in social VR. Experimental results demonstrate that HarassGuard achieves competitive performance compared to state-of-the-art baselines (i.e., LSTM/CNN, Transformer), reaching an accuracy of up to 88.09% in binary classification and 68.85% in multi-class classification. Notably, HarassGuard matches these baselines while using significantly fewer fine-tuning samples (200 vs. 1,115), offering unique advantages in contextual reasoning and privacy-preserving detection.

Computer Vision Constitutional AI & AI Ethics Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

HarassGuard: Detecting Harassment Behaviors in Social Virtual Reality with Vision-Language Models

Related Papers