Search papers, labs, and topics across Lattice.
PersonaTrace leverages LLM agents to synthesize realistic digital footprints from structured user profiles, generating diverse sequences of user events and corresponding digital artifacts. This approach addresses the scarcity of diverse and accessible data for studying behavior, personalized applications, and machine learning model training. Models fine-tuned on the synthetic data generated by PersonaTrace outperform those trained on other synthetic datasets when evaluated on real-world out-of-distribution tasks, demonstrating its superior realism and utility.
Forget painstakingly collecting user data – PersonaTrace lets you bootstrap realistic digital footprints with LLMs, and models trained on this synthetic data actually generalize better to real-world tasks.
Digital footprints (records of individuals'interactions with digital systems) are essential for studying behavior, developing personalized applications, and training machine learning models. However, research in this area is often hindered by the scarcity of diverse and accessible data. To address this limitation, we propose a novel method for synthesizing realistic digital footprints using large language model (LLM) agents. Starting from a structured user profile, our approach generates diverse and plausible sequences of user events, ultimately producing corresponding digital artifacts such as emails, messages, calendar entries, reminders, etc. Intrinsic evaluation results demonstrate that the generated dataset is more diverse and realistic than existing baselines. Moreover, models fine-tuned on our synthetic data outperform those trained on other synthetic datasets when evaluated on real-world out-of-distribution tasks.