Hubei UniversityHubei University)May 21, 2026arXiv:2605.22035

HyLoVQA: Dynamic Hypernetwork-Generated Low-Rank Adaptation for Continual Visual Question Answering

Chenyi Xiong, Ziyue Qin, Miao Zhang, Kui Xiao, Zhifei Li

AI Summary

The paper introduces HyLoVQA, a novel approach for continual Visual Question Answering that mitigates cross-level task interference by dynamically generating Low-Rank Adaptation (LoRA) adapters using a hypernetwork conditioned on retrieved anchors from a drift-resilient memory bank. This memory bank stores visual object and textual task content, updated with current input features. An alignment loss is also introduced to align semantic discrepancies in the feature space with functional changes in the parameter space, improving task focus. Experiments on VQA v2 and NExT-QA demonstrate HyLoVQA's superiority over existing methods in both standard and compositional settings.

Key Contribution

Forget catastrophic forgetting: HyLoVQA's hypernetwork-generated LoRA adapters dynamically adapt to new VQA tasks while preserving past knowledge, outperforming prior state-of-the-art methods.

Abstract

Continual Visual Question Answering (VQA) requires learning from non-stationary streams of visual inputs and questions while preserving past knowledge. Most prior methods adapt by updating a largely shared parameter set. This often leads to cross-level task interference, hindering accurate adaptation to the current task and object. To address this limitation, we propose HyLoVQA. It maintains a drift-resilient memory bank of anchors. The bank stores the content of visual objects and textual tasks, and they are updated using current input features. Conditioned on retrieved anchors, a hypernetwork generates lightweight Low-Rank Adaptation (LoRA) adapters. This ensures parameter efficiency, allowing the model to adapt to each task and object dynamically. Additionally, we formulate an alignment loss that aligns semantic discrepancies in the feature space with functional changes in the parameter space, thereby constraining LoRA adapters to remain focused on the current task and object. Extensive experiments on VQA v2 and NExT-QA under both standard and compositional settings demonstrate the superiority of HyLoVQA over prior state-of-the-art methods.

Computer Vision Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

HyLoVQA: Dynamic Hypernetwork-Generated Low-Rank Adaptation for Continual Visual Question Answering

Related Papers