Feb 24, 2026arXiv:2602.20730

Rethink Efficiency Side of Neural Combinatorial Solver: An Offline and Self-Play Paradigm

Zhenxing Xu, Zeyuan Ma, Weidong Bao, Hui Yan, Yan Zheng, Ji Wang

AI Summary

The paper introduces ECO, a novel offline self-play paradigm for Neural Combinatorial Optimization (NCO) that addresses the inefficiency of online methods. ECO employs a two-phase approach involving supervised warm-up followed by iterative Direct Preference Optimization (DPO), coupled with a Mamba-based architecture designed for efficiency. The method also incorporates a heuristic-based bootstrapping mechanism to stabilize policy improvement, achieving competitive performance on TSP and CVRP while significantly improving memory utilization and training throughput.

Key Contribution

Ditch online training for neural combinatorial solvers: ECO leverages offline self-play with Mamba to achieve state-of-the-art performance with significantly improved memory utilization and training throughput.

Abstract

We propose ECO, a versatile learning paradigm that enables efficient offline self-play for Neural Combinatorial Optimization (NCO). ECO addresses key limitations in the field through: 1) Paradigm Shift: Moving beyond inefficient online paradigms, we introduce a two-phase offline paradigm consisting of supervised warm-up and iterative Direct Preference Optimization (DPO); 2) Architecture Shift: We deliberately design a Mamba-based architecture to further enhance the efficiency in the offline paradigm; and 3) Progressive Bootstrapping: To stabilize training, we employ a heuristic-based bootstrapping mechanism that ensures continuous policy improvement during training. Comparison results on TSP and CVRP highlight that ECO performs competitively with up-to-date baselines, with significant advantage on the efficiency side in terms of memory utilization and training throughput. We provide further in-depth analysis on the efficiency, throughput and memory usage of ECO. Ablation studies show rationale behind our designs.

Architecture Design (Transformers, SSMs, MoE)RLHF & Preference Learning Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Rethink Efficiency Side of Neural Combinatorial Solver: An Offline and Self-Play Paradigm

Related Papers