ANLUDelawareNov 15, 2025arXiv:2602.23220

STELLAR: Storage Tuning Engine Leveraging LLM Autonomous Reasoning for High Performance Parallel File Systems

Chris Egersdoerfer, P. Carns, Shane Snyder, Robert B. Ross, Dong Dai

AI Summary

The paper introduces STELLAR, an autonomous tuning system for parallel file systems that leverages LLMs to optimize I/O performance. STELLAR uses an agentic approach incorporating RAG, external tool execution, and multi-agent design to extract tunable parameters, analyze I/O traces, select tuning strategies, and iteratively refine configurations based on performance feedback. Evaluations demonstrate that STELLAR achieves near-optimal configurations within a few attempts, significantly outperforming traditional autotuning methods that require extensive iterations.

Key Contribution

LLMs can autonomously tune parallel file systems to near-optimal performance in just five attempts, unlocking I/O optimizations previously inaccessible to most domain scientists.

Abstract

I/O performance is crucial to efficiency in data-intensive scientific computing; but tuning large-scale storage systems is complex, costly, and notoriously manpower-intensive, making it inaccessible for most domain scientists. To address this problem, we propose STELLAR, an autonomous tuner for high-performance parallel file systems. Our evaluations show that STELLAR almost always selects near-optimal configurations for the parallel file systems within the first five attempts, even for previously unseen applications. STELLAR’s human-like efficiency is fundamentally different from existing autotuning methods, which often require hundreds of thousands of iterations to converge. STELLAR achieves this through autonomous end-to-end agentic tuning. Powered by large language models (LLMs), STELLAR is capable of (1) accurately extracting tunable parameters from software manuals, (2) analyzing I/O trace logs generated by applications, (3) selecting initial tuning strategies, (4) rerunning applications on real systems and collecting I/O performance feedback, (5) adjusting tuning strategies and repeating the tuning cycle, and (6) reflecting on and summarizing tuning experiences into reusable knowledge for future optimizations. STELLAR integrates retrieval-augmented generation (RAG), external tool execution, LLM-based reasoning, and a multiagent design to stabilize reasoning and combat hallucinations. We evaluate how each of these components impacts optimization outcomes, thus providing insight into the design of similar systems for other optimization problems. STELLAR’s architecture and empirical validation open new avenues for tackling complex system optimization challenges, especially those characterized by vast search spaces and high exploration costs. Its highly efficient autonomous tuning will broaden access to I/O performance optimizations for domain scientists with minimal additional resource investment.

Reasoning & Chain-of-Thought Scientific Discovery & Drug Design Tool Use & Agents

Citation Metrics

Citations2

Influential citations0

References63

Year2025

VenueInternational Conference on Software Composition

Related Papers

Finding related papers...

Search

STELLAR: Storage Tuning Engine Leveraging LLM Autonomous Reasoning for High Performance Parallel File Systems

Related Papers