Search papers, labs, and topics across Lattice.
This paper analyzes changes in peer review reports after the rise of LLMs, focusing on linguistic features, evaluation aspects, and recommendation informativeness. They find that post-LLM reviews are longer and more fluent, with increased emphasis on summaries and clarity, but decreased attention to originality, replicability, and critical reasoning. The study uses linguistic analysis, automated annotation of evaluation aspects, and maximum likelihood estimation to identify LLM-influenced reviews within top AI conference proceedings.
LLMs are subtly reshaping peer review, leading to longer, more superficially polished reports that prioritize clarity over critical assessment of originality and replicability.
With the rapid advancement of Large Language Models (LLMs), the academic community has faced unprecedented disruptions, particularly in the realm of academic communication. The primary function of peer review is improving the quality of academic manuscripts, such as clarity, originality and other evaluation aspects. Although prior studies suggest that LLMs are beginning to influence peer review, it remains unclear whether they are altering its core evaluative functions. Moreover, the extent to which LLMs affect the linguistic form, evaluative focus, and recommendation-related signals of peer-review reports has yet to be systematically examined. In this study, we examine the changes in peer review reports for academic articles following the emergence of LLMs, emphasizing variations at fine-grained level. Specifically, we investigate linguistic features such as the length and complexity of words and sentences in review comments, while also automatically annotating the evaluation aspects of individual review sentences. We also use a maximum likelihood estimation method, previously established, to identify review reports that potentially have modified or generated by LLMs. Finally, we assess the impact of evaluation aspects mentioned in LLM-assisted review reports on the informativeness of recommendation for paper decision-making. The results indicate that following the emergence of LLMs, peer review texts have become longer and more fluent, with increased emphasis on summaries and surface-level clarity, as well as more standardized linguistic patterns, particularly reviewers with lower confidence score. At the same time, attention to deeper evaluative dimensions, such as originality, replicability, and nuanced critical reasoning, has declined.