UIUCJun 4, 2026arXiv:2606.06271

FOXGLOVE: Understanding Goal-Oriented and Anchored Writing Feedback from Experts and LLMs on Argumentative Essays

Yijun Liu, Yifan Song, John Gallagher, Sarah Sterman, Tal August

AI Summary

This study introduces FOXGLOVE, a dataset comprising 696 expert feedback comments and 1,644 comments generated by four large language models (LLMs) on twelfth-grade argumentative essays. The analysis reveals that while both instructors and LLMs provide feedback aligned with writing goals and essay structure, they differ significantly in the specific sentences targeted for feedback, with LLMs producing more complex comments and fewer questions. Notably, LLM feedback is rated higher in quality by instructors, largely due to the length of the comments rather than their substance.

Key Contribution

LLMs may outshine human instructors in feedback ratings, but their complexity masks critical differences in targeted sentence feedback.

Abstract

While large language models (LLMs) are increasingly used to generate writing feedback, there remains no systematic comparison of LLM and expert feedback on the dimensions that writing research identifies as central to revision: goal-orientation, anchoring to specific sentences, and prioritization. We introduce FOXGLOVE, a dataset of 696 feedback comments written by trained writing instructors on 69 twelfth-grade argumentative essays, paired with 1,644 comments generated from four frontier LLMs under a shared protocol, totaling 2,340 comments. We provide expert quality ratings on a subset of both instructor and LLM comments. We find that instructors and LLMs distribute feedback similarly across goals and essay positions, yet instructors and models diverge on the specific sentences on which to provide feedback. Additionally, we find that models tend to write more complex feedback and use fewer questions than instructors. LLM feedback also receives higher ratings on most dimensions of quality, as rated by instructors, but much of this advantage appears to be attributable to lengthier comments. FOXGLOVE enables systematic comparison of where human and LLM feedback align, diverge, and differ.

Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

FOXGLOVE: Understanding Goal-Oriented and Anchored Writing Feedback from Experts and LLMs on Argumentative Essays

Related Papers