Search papers, labs, and topics across Lattice.
This paper introduces a new large-scale, bilingual (English/German) dataset of catalog records annotated with the Integrated Authority File (GND) to facilitate extreme multi-label text classification (XMTC). The dataset includes a machine-actionable GND taxonomy, enabling ontology-aware classification and mapping of text to authority terms. The authors also provide initial experiments and error analyses, encouraging the community to focus on usefulness and transparency in developing AI co-pilots for cataloging.
A massive, bilingual, authority-grounded dataset could finally make AI-assisted cataloging a reality.
Subject indexing is vital for discovery but hard to sustain at scale and across languages. We release a large bilingual (English/German) corpus of catalog records annotated with the Integrated Authority File (GND), plus a machine-actionable GND taxonomy. The resource enables ontology-aware multi-label classification, mapping text to authority terms, and agent-assisted cataloging with reproducible, authority-grounded evaluation. We provide a brief statistical profile and qualitative error analyses of three systems. We invite the community to assess not only accuracy but usefulness and transparency, toward authority-anchored AI co-pilots that amplify catalogers'work.