UFApr 9, 2026arXiv:2604.07717

Detecting HIV-Related Stigma in Clinical Narratives Using Large Language Models

Ziyi Chen, Y. Khan, Yasir Khan, Mengyuan Zhang, Chengdong Peng, Cheng Peng, Mengxian Lyu, Yiyang Liu, Krishna Vaddiparti, Robert L. Cook, Mattia Prosperi, Yonghui Wu

AI Summary

This study developed an LLM-based tool to identify HIV stigma within clinical notes, addressing a gap in off-the-shelf tools for this critical psychosocial determinant of health. They manually annotated 1,332 sentences across four stigma subscales and compared encoder-based (GatorTron, BERT) and generative LLMs (GPT-OSS, LLaMA, MedGemma) using zero-shot and few-shot prompting. GatorTron-large achieved the best performance (Micro F1 = 0.62), while few-shot prompting substantially improved generative model performance, demonstrating the potential for automated stigma detection.

Key Contribution

Identifying HIV-related stigma in clinical notes is now possible with LLMs, potentially improving mental health care and treatment outcomes for people living with HIV.

Abstract

Human immunodeficiency virus (HIV)-related stigma is a critical psychosocial determinant of health for people living with HIV (PLWH), influencing mental health, engagement in care, and treatment outcomes. Although stigma-related experiences are documented in clinical narratives, there is a lack of off-the-shelf tools to extract and categorize them. This study aims to develop a large language model (LLM)-based tool for identifying HIV stigma from clinical notes. We identified clinical notes from PLWH receiving care at the University of Florida (UF) Health between 2012 and 2022. Candidate sentences were identified using expert-curated stigma-related keywords and iteratively expanded via clinical word embeddings. A total of 1,332 sentences were manually annotated across four stigma subscales: Concern with Public Attitudes, Disclosure Concerns, Negative Self-Image, and Personalized Stigma. We compared GatorTron-large and BERT as encoder-based baselines, and GPT-OSS-20B, LLaMA-8B, and MedGemma-27B as generative LLMs, under zero-shot and few-shot prompting. GatorTron-large achieved the best overall performance (Micro F1 = 0.62). Few-shot prompting substantially improved generative model performance, with 5-shot GPT-OSS-20B and LLaMA-8B achieving Micro-F1 scores of 0.57 and 0.59, respectively. Performance varied by stigma subscale, with Negative Self-Image showing the highest predictability and Personalized Stigma remaining the most challenging. Zero-shot generative inference exhibited non-trivial failure rates (up to 32%). This study develops the first practical NLP tool for identifying HIV stigma in clinical notes.

Constitutional AI & AI Ethics Natural Language Processing Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References23

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Detecting HIV-Related Stigma in Clinical Narratives Using Large Language Models

Related Papers