Search papers, labs, and topics across Lattice.
The paper introduces "category splitting," a task where a video classifier is edited to refine coarse categories into finer subcategories without retraining, addressing the limitations of fixed taxonomies in video recognition models. They propose a zero-shot editing method that exploits the latent compositional structure of video classifiers to expose fine-grained distinctions. Experiments on new video benchmarks demonstrate that their method outperforms vision-language baselines and that low-shot fine-tuning further improves performance when initialized with their zero-shot approach.
Forget retraining: this zero-shot method edits video classifiers to expose fine-grained distinctions within existing categories, opening up new possibilities for adapting models to evolving tasks.
Video recognition models are typically trained on fixed taxonomies which are often too coarse, collapsing distinctions in object, manner or outcome under a single label. As tasks and definitions evolve, such models cannot accommodate emerging distinctions and collecting new annotations and retraining to accommodate such changes is costly. To address these challenges, we introduce category splitting, a new task where an existing classifier is edited to refine a coarse category into finer subcategories, while preserving accuracy elsewhere. We propose a zero-shot editing method that leverages the latent compositional structure of video classifiers to expose fine-grained distinctions without additional data. We further show that low-shot fine-tuning, while simple, is highly effective and benefits from our zero-shot initialization. Experiments on our new video benchmarks for category splitting demonstrate that our method substantially outperforms vision-language baselines, improving accuracy on the newly split categories without sacrificing performance on the rest. Project page: https://kaitingliu.github.io/Category-Splitting/.