Search papers, labs, and topics across Lattice.
This paper introduces two prototype-based methods, Prototype-Based Regularization (PBR) and Prototype-Conditioned Modulation (PCM), to enhance Rhetorical Role Labeling (RRL) by integrating local context with global, corpus-level representations. PBR uses a distance-based auxiliary loss to learn soft prototypes, while PCM constructs and injects corpus-level prototypes during training and inference. The approach is evaluated on legal, medical, and scientific datasets, including a new SCOTUS-Law dataset of U.S. Supreme Court opinions, demonstrating improved Macro-F1 scores, particularly for low-frequency roles.
Augmenting hierarchical models with corpus-level prototypes boosts Rhetorical Role Labeling, especially for rare roles, by up to 4 Macro-F1.
Rhetorical Role Labeling (RRL) identifies the functional role of each sentence in a document, a key task for discourse understanding in domains such as law and medicine. While hierarchical models capture local dependencies effectively, they are limited in modeling global, corpus-level features. To address this limitation, we propose two prototype-based methods that integrate local context with global representations. Prototype-Based Regularization (PBR) learns soft prototypes through a distance-based auxiliary loss to structure the latent space, while Prototype-Conditioned Modulation (PCM) constructs corpus-level prototypes and injects them during training and inference. Given the scarcity of RRL resources, we introduce SCOTUS-Law, the first dataset of U.S. Supreme Court opinions annotated with rhetorical roles at three levels of granularity: category, rhetorical function, and step. Experiments on legal, medical, and scientific benchmarks show consistent improvements over strong baselines, with 4 Macro-F1 gains on low-frequency roles. We further analyze the implications in the era of Large Language Models and complement our findings with expert evaluation.