Normal UniversityApr 9, 2026arXiv:2604.07737

SepSeq: A Training-Free Framework for Long Numerical Sequence Processing in LLMs

Jie Sun, Yu Liu, Lu Han, Qiwen Deng, Xiangyu Shu, Xiang Shu, Yang Xiao, Xingyu Lu, Jun Zhou, Pengfei Liu, Lintao Ma, Jiancan Wu, Xiang Wang

AI Summary

The paper introduces SepSeq, a training-free framework that mitigates attention dispersion in LLMs when processing long numerical sequences by strategically inserting separator tokens. These tokens act as attention sinks, refocusing attention on relevant local segments while maintaining global context awareness. Experiments across 9 LLMs show SepSeq improves accuracy by 35.6% and reduces token consumption by 16.4% on average.

Key Contribution

LLMs choke on long numerical sequences, but a simple separator token trick can boost accuracy by 35% and cut token costs by 16%—without any training.

Abstract

While transformer-based Large Language Models (LLMs) theoretically support massive context windows, they suffer from severe performance degradation when processing long numerical sequences. We attribute this failure to the attention dispersion in the Softmax mechanism, which prevents the model from concentrating attention. To overcome this, we propose Separate Sequence (SepSeq), a training-free, plug-and-play framework to mitigate dispersion by strategically inserting separator tokens. Mechanistically, we demonstrate that separator tokens act as an attention sink, recalibrating attention to focus on local segments while preserving global context. Extensive evaluations on 9 widely-adopted LLMs confirm the effectiveness of our approach: SepSeq yields an average relative accuracy improvement of 35.6% across diverse domains while reducing total inference token consumption by 16.4% on average.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References51

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SepSeq: A Training-Free Framework for Long Numerical Sequence Processing in LLMs

Related Papers