NVIDIATechnionFeb 17, 2026arXiv:2602.15727

Spanning the Visual Analogy Space with a Weight Basis of LoRAs

Hila Manor, Hila Manor, Rinon Gal, Rinon Gal, Haggai Maron, Haggai Maron, Tomer Michaeli, T. Michaeli, Gal Chechik, Gal Chechik

AI Summary

The paper introduces LoRWeB, a novel approach for visual analogy learning that addresses the generalization limitations of single LoRA adaptation methods. LoRWeB uses a learnable basis of LoRA modules to span the space of visual transformations and a lightweight encoder to dynamically select and weigh these basis LoRAs based on the input analogy pair. Experiments demonstrate state-of-the-art performance and improved generalization to unseen visual transformations, suggesting LoRA basis decompositions are a promising direction.

Key Contribution

Forget monolithic LoRAs: LoRWeB dynamically mixes a basis set of LoRAs to unlock SOTA generalization in visual analogy tasks.

Abstract

Visual analogy learning enables image manipulation through demonstration rather than textual description, allowing users to specify complex transformations difficult to articulate in words. Given a triplet $\{\mathbf{a}$, $\mathbf{a}'$, $\mathbf{b}\}$, the goal is to generate $\mathbf{b}'$ such that $\mathbf{a} : \mathbf{a}':: \mathbf{b} : \mathbf{b}'$. Recent methods adapt text-to-image models to this task using a single Low-Rank Adaptation (LoRA) module, but they face a fundamental limitation: attempting to capture the diverse space of visual transformations within a fixed adaptation module constrains generalization capabilities. Inspired by recent work showing that LoRAs in constrained domains span meaningful, interpolatable semantic spaces, we propose LoRWeB, a novel approach that specializes the model for each analogy task at inference time through dynamic composition of learned transformation primitives, informally, choosing a point in a"space of LoRAs". We introduce two key components: (1) a learnable basis of LoRA modules, to span the space of different visual transformations, and (2) a lightweight encoder that dynamically selects and weighs these basis LoRAs based on the input analogy pair. Comprehensive evaluations demonstrate our approach achieves state-of-the-art performance and significantly improves generalization to unseen visual transformations. Our findings suggest that LoRA basis decompositions are a promising direction for flexible visual manipulation. Code and data are in https://research.nvidia.com/labs/par/lorweb

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References66

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Spanning the Visual Analogy Space with a Weight Basis of LoRAs

Related Papers