Search papers, labs, and topics across Lattice.
The paper introduces RbtAct, a method for generating actionable peer review feedback by leveraging existing peer review rebuttals as implicit supervision. A new task, perspective-conditioned segment-level review feedback generation, is proposed, along with the RMR-75K dataset mapping review segments to rebuttal segments. Supervised fine-tuning and preference optimization of Llama-3.1-8B-Instruct on this data results in more actionable and specific feedback compared to strong baselines, as evaluated by human experts and LLM judges.
Rebuttals hold the key to actionable AI-generated peer reviews: RbtAct uses them to train LLMs to give feedback that authors actually use.
Large language models (LLMs) are increasingly used across the scientific workflow, including to draft peer-review reports. However, many AI-generated reviews are superficial and insufficiently actionable, leaving authors without concrete, implementable guidance and motivating the gap this work addresses. We propose RbtAct, which targets actionable review feedback generation and places existing peer review rebuttal at the center of learning. Rebuttals show which reviewer comments led to concrete revisions or specific plans, and which were only defended. Building on this insight, we leverage rebuttal as implicit supervision to directly optimize a feedback generator for actionability. To support this objective, we propose a new task called perspective-conditioned segment-level review feedback generation, in which the model is required to produce a single focused comment based on the complete paper and a specified perspective such as experiments and writing. We also build a large dataset named RMR-75K that maps review segments to the rebuttal segments that address them, with perspective labels and impact categories that order author uptake. We then train the Llama-3.1-8B-Instruct model with supervised fine-tuning on review segments followed by preference optimization using rebuttal derived pairs. Experiments with human experts and LLM-as-a-judge show consistent gains in actionability and specificity over strong baselines while maintaining grounding and relevance.