Search papers, labs, and topics across Lattice.
This paper introduces a theoretical framework for meta-learning, distinguishing between algorithm-explicit and algorithm-implicit learning, to address the limitations of current meta-learning methods that are constrained to narrow task distributions. Guided by this framework, the authors propose TAIL, a transformer-based algorithm-implicit meta-learner, which incorporates random projections for cross-modal feature encoding, random injection label embeddings for extrapolating to larger label spaces, and efficient inline query processing. TAIL achieves state-of-the-art performance on few-shot benchmarks and demonstrates generalization to unseen domains, modalities, and label spaces, while also offering computational savings.
Forget training on just images or text – TAIL, a new meta-learner, masters both and extrapolates to 20x more classes, all while slashing compute costs.
Current meta-learning methods are constrained to narrow task distributions with fixed feature and label spaces, limiting applicability. Moreover, the current meta-learning literature uses key terms like "universal" and "general-purpose" inconsistently and lacks precise definitions, hindering comparability. We introduce a theoretical framework for meta-learning which formally defines practical universality and introduces a distinction between algorithm-explicit and algorithm-implicit learning, providing a principled vocabulary for reasoning about universal meta-learning methods. Guided by this framework, we present TAIL, a transformer-based algorithm-implicit meta-learner that functions across tasks with varying domains, modalities, and label configurations. TAIL features three innovations over prior transformer-based meta-learners: random projections for cross-modal feature encoding, random injection label embeddings that extrapolate to larger label spaces, and efficient inline query processing. TAIL achieves state-of-the-art performance on standard few-shot benchmarks while generalizing to unseen domains. Unlike other meta-learning methods, it also generalizes to unseen modalities, solving text classification tasks despite training exclusively on images, handles tasks with up to 20$\times$ more classes than seen during training, and provides orders-of-magnitude computational savings over prior transformer-based approaches.