Search papers, labs, and topics across Lattice.
The paper introduces COACH, an LLM-driven pipeline for generating personalized lifestyle counseling for cancer patients, and QUORUM, a novel evaluation framework unifying developer, expert, and user perspectives. Applying QUORUM to COACH reveals convergence among stakeholders regarding relevance, quality, and reliability of the generated counseling. However, the study also identifies divergences in opinions concerning tone, error sensitivity, and potential hallucinations, emphasizing the need for multi-stakeholder evaluation in healthcare NLP.
LLM-generated health counseling appears promising but reveals critical stakeholder disagreements on tone and error handling, highlighting the need for more nuanced evaluation beyond simple relevance and quality metrics.
Systems that collect data on sleep, mood, and activities can provide valuable lifestyle counselling to populations affected by chronic disease and its consequences. Such systems are, however, challenging to develop; besides reliably extracting patterns from user-specific data, systems should also contextualise these patterns with validated medical knowledge to ensure the quality of counselling, and generate counselling that is relevant to a real user. We present QUORUM, a new evaluation framework that unifies these developer-, expert-, and user-centric perspectives, and show with a real case study that it meaningfully tracks convergence and divergence in stakeholder perspectives. We also present COACH, a Large Language Model-driven pipeline to generate personalised lifestyle counselling for our Healthy Chronos use case, a diary app for cancer patients and survivors. Applying our framework shows that overall, users, medical experts, and developers converge on the opinion that the generated counselling is relevant, of good quality, and reliable. However, stakeholders also diverge on the tone of the counselling, sensitivity to errors in pattern-extraction, and potential hallucinations. These findings highlight the importance of multi-stakeholder evaluation for consumer health language technologies and illustrate how a unified evaluation framework can support trustworthy, patient-centered NLP systems in real-world settings.