Search papers, labs, and topics across Lattice.
This paper introduces End-to-End Microservice Remediation (E2E-MR), a new task focused on directly generating executable playbooks from diagnosis reports for autonomous system restoration. To facilitate research in this area, the authors created MicroRemed, a benchmark for automated microservice deployment, failure injection, playbook execution, and repair verification. They then propose E2E-REME, an end-to-end auto-remediation model trained using experience-simulation reinforcement fine-tuning, demonstrating superior accuracy and efficiency compared to existing LLMs on both public and industrial microservice platforms.
Forget prompt engineering: E2E-REME directly generates executable Ansible playbooks from diagnosis reports, outperforming large LLMs in microservice auto-remediation accuracy and efficiency.
Contemporary microservice systems continue to grow in scale and complexity, leading to increasingly frequent and costly failures. While recent LLM-based auto-remediation approaches have emerged, they primarily translate textual instructions into executable Ansible playbooks and rely on expert-crafted prompts, lacking runtime knowledge guidance and depending on large-scale general-purpose LLMs, which limits their accuracy and efficiency. We introduce \textit{End-to-End Microservice Remediation} (E2E-MR), a new task that requires directly generating executable playbooks from diagnosis reports to autonomously restore faulty systems. To enable rigorous evaluation, we build \textit{MicroRemed}, a benchmark that automates microservice deployment, failure injection, playbook execution, and post-repair verification. We further propose \textit{E2E-REME}, an end-to-end auto-remediation model trained via experience-simulation reinforcement fine-tuning. Experiments on public and industrial microservice platforms, compared with nine representative LLMs, show that E2E-REME achieves superior accuracy and efficiency.