Search papers, labs, and topics across Lattice.
The paper investigates metrics for lexical semantic change detection (LSCD) using contextualized language model embeddings, noting the field's over-reliance on Average Pairwise Distance (APD) and prototype cosine distance (PRT). It introduces Average Minimum Distance (AMD) and Symmetric Average Minimum Distance (SAMD) which quantify change through local correspondence of word usages across time. Experiments across languages, encoders, and representation spaces demonstrate AMD's robustness, especially with dimensionality reduction and non-specialized encoders, while SAMD performs best with specialized encoders, suggesting the field should consider alternative metrics.
Forget APD and PRT: AMD and SAMD offer more robust ways to track how words evolve, especially when your embeddings aren't perfect.
Lexical semantic change detection (LSCD) increasingly relies on contextualised language model embeddings, yet most approaches still quantify change using a small set of semantic change metrics, primarily Average Pairwise Distance (APD) and cosine distance over word prototypes (PRT). We introduce Average Minimum Distance (AMD) and Symmetric Average Minimum Distance (SAMD), new measures that quantify semantic change via local correspondence between word usages across time periods. Across multiple languages, encoder models, and representation spaces, we show that AMD often provides more robust performance, particularly under dimensionality reduction and with non-specialised encoders, while SAMD excels with specialised encoders. We suggest that LSCD may benefit from considering alternative semantic change metrics beyond APD and PRT, with AMD offering a robust option for contextualised embedding-based analysis.