Search papers, labs, and topics across Lattice.
The paper addresses the challenge of analyzing a large collection of leisure time activities and organizational memberships extracted from Finnish WWII Karelian evacuee family interviews. They develop a categorization framework to capture key aspects of participation (activity/organization type, socialness, regularity, physical demand) and annotate a gold-standard set for evaluation. They demonstrate that an open-weight LLM, using a voting approach, can closely match expert judgments in applying this schema at scale, enabling the labeling of 350K entities for downstream social integration studies.
LLMs can categorize 350K historical mentions of leisure activities and organizational memberships with near-expert accuracy, unlocking large-scale quantitative analysis of social integration from digitized archives.
Digitized historical archives make it possible to study everyday social life on a large scale, but the information extracted directly from text often does not directly allow one to answer the research questions posed by historians or sociologists in a quantitative manner. We address this problem in a large collection of Finnish World War II Karelian evacuee family interviews. Prior work extracted more than 350K mentions of leisure time activities and organizational memberships from these interviews, yielding 71K unique activity and organization names -- far too many to analyze directly. We develop a categorization framework that captures key aspects of participation (the kind of activity/organization, how social it typically is, how regularly it happens, and how physically demanding it is). We annotate a gold-standard set to allow for a reliable evaluation, and then test whether large language models can apply the same schema at scale. Using a simple voting approach across multiple model runs, we find that an open-weight LLM can closely match expert judgments. Finally, we apply the method to label the 350K entities, producing a structured resource for downstream studies of social integration and related outcomes.