Search papers, labs, and topics across Lattice.
This paper introduces a new crowdsourced dataset of sentence-level readability annotations for German ESG reports, finding that while overall readability is perceived as high, subjectivity is significant. They benchmark various readability scoring methods, including LLM prompting and fine-tuned transformers, against human rankings. Results indicate that a small, fine-tuned transformer achieves the lowest prediction error in assessing readability, with ensembling offering marginal gains.
Forget complex LLMs: a small, fine-tuned transformer surprisingly nails readability scoring for German ESG reports.
With the ever-growing urgency of sustainability in the economy and society, and the massive stream of information that comes with it, consumers need reliable access to that information. To address this need, companies began publishing so called Environmental, Social, and Governance (ESG) reports, both voluntarily and forced by law. To serve the public, these reports must be addressed not only to financial experts but also to non-expert audiences. But are they written clearly enough? In this work, we extend an existing sentence-level dataset of German ESG reports with crowdsourced readability annotations. We find that, in general, native speakers perceive sentences in ESG reports as easy to read, but also that readability is subjective. We apply various readability scoring methods and evaluate them regarding their prediction error and correlation with human rankings. Our analysis shows that, while LLM prompting has potential for distinguishing clear from hard-to-read sentences, a small finetuned transformer predicts human readability with the lowest error. Averaging predictions of multiple models can slightly improve the performance at the cost of slower inference.