Search papers, labs, and topics across Lattice.
This paper introduces a two-stage framework, TOPPing+VACAI-Bowl, to improve language generalization for unseen low-resource varieties by explicitly modeling both variety-specific and variety-invariant features. TOPPing is a source-selection method tailored for low-resource varieties, while VACAI-Bowl is a lightweight architecture that uses adversarial training to disentangle variety-specific and invariant attributes. Experiments on dependency parsing across 10 low-resource varieties demonstrate a 54.62% average improvement, suggesting the framework's effectiveness in capturing crucial linguistic cues for generalization.
Dissimilarity, not just similarity, unlocks better language generalization for low-resource varieties.
Low-resource language varieties used by specific groups remain neglected in the development of Multilingual Language Models. A great deal of cross-lingual research focuses on inter-lingual language transfer which strives to align allied varieties and minimize differences between them. However, for low-resource varieties, linguistic dissimilarity is also an important cue allowing generalization to unseen varieties. Unlike prior approaches, we propose a two-stage Language Generalization framework that focuses on capturing variety-specific cues while also exploiting rich overlap offered by high-resource source variety. First, we propose TOPPing, a source-selection method specifically designed for low-resource varieties. Second, we suggest a lightweight VACAI-Bowl architecture that learns variety-specific attributes with one branch while a parallel branch captures variety-invariant attributes using adversarial training. We evaluate our framework on structural prediction tasks, which are among the few tasks available, as proxy for performance on other downstream tasks. Using VACAI-Bowl with TOPPing yields an average 54.62% improvement in the dependency parsing task, which serves as a proxy for performance on other downstream tasks across 10 low-resource varieties.