Computer Vision LabUniversity of WürzburgMay 6, 2026arXiv:2605.04903

Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs

Santosh Premi Adhikari, Radu Timofte, Dmitry Ignatov

AI Summary

This paper introduces Delta-Code Generation, a method for fine-tuning LLMs to generate code diffs (deltas) that refine existing neural architectures instead of generating complete models from scratch. They iteratively fine-tune LLMs using LoRA on architectures from the LEMUR dataset, employing MinHash-Jaccard novelty filtering to ensure structural diversity in the training data. Experiments across six datasets demonstrate that this delta-based approach significantly outperforms full-generation baselines in terms of validity rate, accuracy, and code length, achieving up to 85.5% first-epoch accuracy on CIFAR-10 with a 75-85% reduction in output length.

Key Contribution

LLMs can now generate neural architectures with 75% less code and higher accuracy by learning to write code "diffs" instead of building from scratch.

Abstract

Large language models (LLMs) show strong potential for neural architecture generation, yet existing approaches produce complete model implementations from scratch -- computationally expensive and yielding verbose code. We propose Delta-Code Generation, where fine-tuned LLMs generate compact unified diffs (deltas) to refine baseline architectures rather than synthesizing entire models. Our pipeline iteratively fine-tunes the LLM via LoRA on curated architectures from the LEMUR dataset, with MinHash-Jaccard novelty filtering for structural diversity. We evaluate three 7B-class LLMs -- DeepSeek-Coder-7B, Qwen2.5-Coder-7B, and Mistral-7B -- across six datasets (CIFAR-10, CIFAR-100, MNIST, SVHN, ImageNette, CelebA) using a 22-cycle protocol (1,100 candidates per LLM). All three substantially surpass the full-generation baseline (50.6% valid rate, 42.3% mean first-epoch accuracy): DeepSeek-Coder reaches 75.3% valid rate and 65.8% mean accuracy; Qwen2.5-Coder 72.1%/64.6%; Mistral 66.6%/66.1%. On CIFAR-10, best first-epoch accuracies reach 85.5% (Mistral), 85.2% (DeepSeek), 80.6% (Qwen) -- well above 63.98% full generation and 71.5% for the concurrent approach of Gu et al. Output lengths are 30-50 lines versus 200+ for full generation (75-85% reduction). A 50-epoch study confirms the 1-epoch proxy preserves rankings (Mistral: Spearman $ρ$ = 0.926). Delta-based generation is a token-efficient, multi-domain, LLM-agnostic alternative to full-model synthesis for LLM-driven NAS.

Architecture Design (Transformers, SSMs, MoE)Code Generation & Program Synthesis Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...