Search papers, labs, and topics across Lattice.
The paper introduces "skill neologisms," soft tokens optimized to enhance LLM performance on specific skills without weight updates, addressing catastrophic forgetting in continual learning. They demonstrate that pre-trained LLMs already exhibit tokens associated with procedural knowledge and that new skill neologisms can be learned to improve performance on targeted skills. Key findings show that these neologisms are composable with out-of-distribution skills and can be combined zero-shot, suggesting a path towards scalable skill-based continual learning.
Forget fine-tuning: "skill neologisms"鈥攏ew soft tokens鈥攍et you inject skills into LLMs without weight updates, composing them zero-shot for flexible knowledge expansion.
Modern LLMs show mastery over an ever-growing range of skills, as well as the ability to compose them flexibly. However, extending model capabilities to new skills in a scalable manner is an open-problem: fine-tuning and parameter-efficient variants risk catastrophic forgetting, while context-based approaches have limited expressiveness and are constrained by the model's effective context. We explore skill neologisms--i.e., soft tokens integrated in the model's vocabulary and optimized to improve capabilities over a specific skill--as a way to selectively extend model capabilities to new skills without weight updates. We first observe that off-the-shelf pre-trained LLMs already demonstrate tokens associated with procedural knowledge. We then show that skill neologisms can be learned to improve model capabilities on specific skills while being composable with out-of-distribution skills, and that independently trained skill neologisms can be composed zero-shot. These results suggest that skill neologisms may provide a scalable path towards skill-based continual learning.