Search papers, labs, and topics across Lattice.
This paper introduces an LLM-driven pipeline to construct and analyze social media datasets for measuring and comparing loneliness in caregivers and non-caregivers. They develop a loneliness evaluation framework and a typology for categorizing causes of loneliness, achieving accuracies of 76-80% and F1 scores of 0.80-0.83, respectively. Applying this pipeline to Reddit data using GPT-4o, GPT-5-nano, and GPT-5, the study reveals substantial differences in the causes of loneliness between the two groups, with caregivers' loneliness predominantly linked to their caregiving roles.
LLMs can now reliably measure and categorize the causes of loneliness from social media text, revealing that caregivers experience loneliness in fundamentally different ways than non-caregivers.
This paper presents an LLM-driven approach for constructing diverse social media datasets to measure and compare loneliness in the caregiver and non-caregiver populations. We introduce an expert-developed loneliness evaluation framework and an expert-informed typology for categorizing causes of loneliness for analyzing social media text. Using a human-validated data processing pipeline, we apply GPT-4o, GPT-5-nano, and GPT-5 to build a high-quality Reddit corpus and analyze loneliness across both populations. The loneliness evaluation framework achieved average accuracies of 76.09% and 79.78% for caregivers and non-caregivers, respectively. The cause categorization framework achieved micro-aggregate F1 scores of 0.825 and 0.80 for caregivers and non-caregivers, respectively. Across populations, we observe substantial differences in the distribution of types of causes of loneliness. Caregivers'loneliness were predominantly linked to caregiving roles, identity recognition, and feelings of abandonment, indicating distinct loneliness experiences between the two groups. Demographic extraction further demonstrates the viability of Reddit for building a diverse caregiver loneliness dataset. Overall, this work establishes an LLM-based pipeline for creating high quality social media datasets for studying loneliness and demonstrates its effectiveness in analyzing population-level differences in the manifestation of loneliness.