Search papers, labs, and topics across Lattice.
This paper investigates the vulnerability of risk-controlling recommender systems to collective manipulation, where coordinated users exploit feedback mechanisms to degrade recommendation quality. Using data from a large-scale video platform, the authors demonstrate that a small group (1% of users) can significantly reduce nDCG for non-adversarial users (up to 20%) through coordinated "Not Interested" reports. They then propose and evaluate a mitigation strategy that provides per-user safety guarantees, reducing the impact of coordinated attacks.
A coordinated attack by just 1% of users can degrade recommendation quality by 20% in risk-controlling recommender systems, even with simple, algorithm-agnostic strategies.
Recommendation systems have become central gatekeepers of online information, shaping user behaviour across a wide range of activities. In response, users increasingly organize and coordinate to steer algorithmic outcomes toward diverse goals, such as promoting relevant content or limiting harmful material, relying on platform affordances -- such as likes, reviews, or ratings. While these mechanisms can serve beneficial purposes, they can also be leveraged for adversarial manipulation, particularly in systems where such feedback directly informs safety guarantees. In this paper, we study this vulnerability in recently proposed risk-controlling recommender systems, which use binary user feedback (e.g.,"Not Interested") to provably limit exposure to unwanted content via conformal risk control. We empirically demonstrate that their reliance on aggregate feedback signals makes them inherently susceptible to coordinated adversarial user behaviour. Using data from a large-scale online video-sharing platform, we show that a small coordinated group (comprising only 1% of the user population) can induce up to a 20% degradation in nDCG for non-adversarial users by exploiting the affordances provided by risk-controlling recommender systems. We evaluate simple, realistic attack strategies that require little to no knowledge of the underlying recommendation algorithm and find that, while coordinated users can significantly harm overall recommendation quality, they cannot selectively suppress specific content groups through reporting alone. Finally, we propose a mitigation strategy that shifts guarantees from the group level to the user level, showing empirically how it can reduce the impact of adversarial coordinated behaviour while ensuring personalized safety for individuals.