EmoryApr 9, 2026arXiv:2604.07834

Why Are We Lonely? Leveraging LLMs to Measure and Understand Loneliness in Caregivers and Non-caregivers

Michelle Damin Kim, Ellie S. Paek, Yufen Lin, Emily Mroz, Jane Chung, Jinho D. Choi

AI Summary

This paper introduces an LLM-driven pipeline to construct and analyze social media datasets for measuring and comparing loneliness in caregivers and non-caregivers. They develop a loneliness evaluation framework and a typology for categorizing causes of loneliness, achieving accuracies of 76-80% and F1 scores of 0.80-0.83, respectively. Applying this pipeline to Reddit data using GPT-4o, GPT-5-nano, and GPT-5, the study reveals substantial differences in the causes of loneliness between the two groups, with caregivers' loneliness predominantly linked to their caregiving roles.

Key Contribution

LLMs can now reliably measure and categorize the causes of loneliness from social media text, revealing that caregivers experience loneliness in fundamentally different ways than non-caregivers.

Abstract

This paper presents an LLM-driven approach for constructing diverse social media datasets to measure and compare loneliness in the caregiver and non-caregiver populations. We introduce an expert-developed loneliness evaluation framework and an expert-informed typology for categorizing causes of loneliness for analyzing social media text. Using a human-validated data processing pipeline, we apply GPT-4o, GPT-5-nano, and GPT-5 to build a high-quality Reddit corpus and analyze loneliness across both populations. The loneliness evaluation framework achieved average accuracies of 76.09% and 79.78% for caregivers and non-caregivers, respectively. The cause categorization framework achieved micro-aggregate F1 scores of 0.825 and 0.80 for caregivers and non-caregivers, respectively. Across populations, we observe substantial differences in the distribution of types of causes of loneliness. Caregivers'loneliness were predominantly linked to caregiving roles, identity recognition, and feelings of abandonment, indicating distinct loneliness experiences between the two groups. Demographic extraction further demonstrates the viability of Reddit for building a diverse caregiver loneliness dataset. Overall, this work establishes an LLM-based pipeline for creating high quality social media datasets for studying loneliness and demonstrates its effectiveness in analyzing population-level differences in the manifestation of loneliness.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References23

Year2026

VenueProceedings of the 1st Workshop on Linguistic Analysis for Health (HeaLing 2026)

Related Papers

Finding related papers...

Search

Why Are We Lonely? Leveraging LLMs to Measure and Understand Loneliness in Caregivers and Non-caregivers

Related Papers