Microsoft ResearchBITHKUSTPKUApr 13, 2026arXiv:2604.11094

E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning

Lingzhe Zhang, Minghua He, Zhaoyang Liu, Ying Li

AI Summary

This paper introduces End-to-End Microservice Remediation (E2E-MR), a new task focused on directly generating executable playbooks from diagnosis reports for autonomous system restoration. To facilitate research in this area, the authors created MicroRemed, a benchmark for automated microservice deployment, failure injection, playbook execution, and repair verification. They then propose E2E-REME, an end-to-end auto-remediation model trained using experience-simulation reinforcement fine-tuning, demonstrating superior accuracy and efficiency compared to existing LLMs on both public and industrial microservice platforms.

Key Contribution

Forget prompt engineering: E2E-REME directly generates executable Ansible playbooks from diagnosis reports, outperforming large LLMs in microservice auto-remediation accuracy and efficiency.

Abstract

Contemporary microservice systems continue to grow in scale and complexity, leading to increasingly frequent and costly failures. While recent LLM-based auto-remediation approaches have emerged, they primarily translate textual instructions into executable Ansible playbooks and rely on expert-crafted prompts, lacking runtime knowledge guidance and depending on large-scale general-purpose LLMs, which limits their accuracy and efficiency. We introduce \textit{End-to-End Microservice Remediation} (E2E-MR), a new task that requires directly generating executable playbooks from diagnosis reports to autonomously restore faulty systems. To enable rigorous evaluation, we build \textit{MicroRemed}, a benchmark that automates microservice deployment, failure injection, playbook execution, and post-repair verification. We further propose \textit{E2E-REME}, an end-to-end auto-remediation model trained via experience-simulation reinforcement fine-tuning. Experiments on public and industrial microservice platforms, compared with nine representative LLMs, show that E2E-REME achieves superior accuracy and efficiency.

Code Generation & Program Synthesis Distributed Systems & Hardware Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References63

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning

Related Papers