Search papers, labs, and topics across Lattice.
The paper introduces HyLoVQA, a novel approach for continual Visual Question Answering that mitigates cross-level task interference by dynamically generating Low-Rank Adaptation (LoRA) adapters using a hypernetwork conditioned on retrieved anchors from a drift-resilient memory bank. This memory bank stores visual object and textual task content, updated with current input features. An alignment loss is also introduced to align semantic discrepancies in the feature space with functional changes in the parameter space, improving task focus. Experiments on VQA v2 and NExT-QA demonstrate HyLoVQA's superiority over existing methods in both standard and compositional settings.
Forget catastrophic forgetting: HyLoVQA's hypernetwork-generated LoRA adapters dynamically adapt to new VQA tasks while preserving past knowledge, outperforming prior state-of-the-art methods.
Continual Visual Question Answering (VQA) requires learning from non-stationary streams of visual inputs and questions while preserving past knowledge. Most prior methods adapt by updating a largely shared parameter set. This often leads to cross-level task interference, hindering accurate adaptation to the current task and object. To address this limitation, we propose HyLoVQA. It maintains a drift-resilient memory bank of anchors. The bank stores the content of visual objects and textual tasks, and they are updated using current input features. Conditioned on retrieved anchors, a hypernetwork generates lightweight Low-Rank Adaptation (LoRA) adapters. This ensures parameter efficiency, allowing the model to adapt to each task and object dynamically. Additionally, we formulate an alignment loss that aligns semantic discrepancies in the feature space with functional changes in the parameter space, thereby constraining LoRA adapters to remain focused on the current task and object. Extensive experiments on VQA v2 and NExT-QA under both standard and compositional settings demonstrate the superiority of HyLoVQA over prior state-of-the-art methods.