OxfordMar 2, 2026arXiv:2603.01820

Deep Learning for Financial Time Series: A Large-Scale Benchmark of Risk-Adjusted Performance

Adir Saly-Kaufmann, Kieran Wood, Jan Peter-Calliess, Stefan Zohren

AI Summary

This paper benchmarks deep learning architectures for financial time series prediction and position sizing, focusing on Sharpe ratio optimization using a daily futures dataset from 2010-2025. The study evaluates linear models, recurrent networks, transformers, state space models, and sequence representation approaches, considering statistical significance, risk measures, transaction costs, and computational efficiency. The key finding is that models designed for temporal representation learning, particularly hybrid models like VSN with LSTM, outperform linear benchmarks and generic deep learning models in Sharpe ratio and downside risk metrics, while xLSTM shows robustness to transaction costs.

Key Contribution

Forget generic deep learning models—for financial time series prediction, architectures explicitly designed to capture temporal dynamics, like VSN with LSTM, deliver superior risk-adjusted returns and robustness.

Abstract

We present a large scale benchmark of modern deep learning architectures for a financial time series prediction and position sizing task, with a primary focus on Sharpe ratio optimization. Evaluating linear models, recurrent networks, transformer based architectures, state space models, and recent sequence representation approaches, we assess out of sample performance on a daily futures dataset spanning commodities, equity indices, bonds, and FX spanning 2010 to 2025. Our evaluation goes beyond average returns and includes statistical significance, downside and tail risk measures, breakeven transaction cost analysis, robustness to random seed selection, and computational efficiency. We find that models explicitly designed to learn rich temporal representations consistently outperform linear benchmarks and generic deep learning models, which often lead the ranking in standard time series benchmarks. Hybrid models such as VSN with LSTM, a combination of Variable Selection Networks (VSN) and LSTMs, achieves the highest overall Sharpe ratio, while VSN with xLSTM and LSTM with PatchTST exhibit superior downside adjusted characteristics. xLSTM demonstrates the largest breakeven transaction cost buffer, indicating improved robustness to trading frictions.

Architecture Design (Transformers, SSMs, MoE)Eval Frameworks & Benchmarks

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Deep Learning for Financial Time Series: A Large-Scale Benchmark of Risk-Adjusted Performance

Related Papers