Search papers, labs, and topics across Lattice.
This paper introduces GraphFlow, a graph-based workflow management system for LLM-based agents that uses a unified graph representation (wGraph) of atomic operations to dynamically instantiate task-specific workflows. GraphFlow adaptively generates workflows from wGraph based on task semantics and uses the graph structure to efficiently manage KV caches, reducing redundant computation. Experiments across five benchmarks demonstrate that GraphFlow improves performance by 4.95% on average and reduces memory footprint by 4x compared to existing methods.
LLM agents can get a ~5% performance boost and 4x memory reduction by representing workflows as graphs and adaptively instantiating them based on task semantics.
Large Language Model (LLM)-based agents demonstrate strong reasoning and execution capabilities on complex tasks when guided by structured instructions, commonly referred to as workflows. However, existing workflow-assisted agent serving systems typically rely on predefined templates and shallow matching mechanisms, which limit their ability to capture deep semantic relationships and generalize to previously unseen tasks. To address these limitations, we propose a new workflow management paradigm that represents workflows using a unified graph, termed wGraph, where each node corresponds to an atomic operation. wGraph serves as a shared substrate from which task-specific workflows are dynamically instantiated. Building on wGraph primitives, we introduce GraphFlow, a system that efficiently integrates workflows into agent serving through two key designs. First, adaptive workflow generation dynamically constructs workflows from wGraph based on task semantics and constraint requirements. Second, workflow state management exploits wGraph structure to efficiently manage Key-Value (KV) caches, reducing redundant computation during agent serving. Extensive experiments across five benchmark datasets show that GraphFlow consistently outperforms state-of-the-art methods, yielding an average performance improvement of approximately 4.95 percentage points, while achieving an approximately 4$\times$ reduction in memory footprint.