Mar 17, 2026arXiv:2603.16413

Trained Persistent Memory for Frozen Encoder--Decoder LLMs: Six Architectural Methods

AI Summary

This paper explores the feasibility of adding persistent memory to frozen encoder-decoder LLMs by training small adapters to write and read from a continuous latent space memory bank. Six architectural methods, varying injection points and write mechanisms, are implemented and evaluated on the LoCoMo dataset. Results show that with sufficient memory capacity (10x), all trained adapters demonstrate positive memory recall, enabling conversational learning without updating the frozen backbone.

Key Contribution

Frozen LLMs can learn to remember things across conversations, even with limited resources, by training adapters to read and write to a continuous latent space memory bank.

Abstract

Frozen encoder--decoder language models are stateless: the latent representation is discarded after every forward pass, so no information persists across sessions. This paper presents a \textbf{proof-of-concept pilot study} showing that persistent memory in the \emph{continuous latent space} of a frozen LLM is feasible -- even under severe resource constraints (a single frozen Flan-T5-XL backbone, small trainable adapters, a single dataset). We implement six architectural methods spanning three injection points and four write mechanisms; unlike text-level memory systems, every write and read is a differentiable operation on dense vectors. After training only the adapter, the memory bank continues to accumulate at inference time without gradients, enabling \emph{conversational learning}. Under a forgetting-curve evaluation on LoCoMo at two capacity scales (1$\times$ and 10$\times$), the stateless baseline scores exactly zero; at 10$\times$ all six trained adapters produce positive memory-recall curves; at 1$\times$ three methods collapse, revealing capacity as a critical design parameter. Because the memory bank is a compact numerical array, it can be scaled to arbitrarily large capacity without altering the backbone. We argue that full end-to-end training with larger models, larger data, and orders-of-magnitude larger memory will yield substantially stronger results; this pilot study establishes the feasibility baseline and design-space taxonomy that such efforts require.

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Trained Persistent Memory for Frozen Encoder--Decoder LLMs: Six Architectural Methods

Related Papers