Search papers, labs, and topics across Lattice.
AAFLOW is introduced as a distributed runtime for agentic workflows, addressing scalability and reproducibility issues in existing LLM systems by modeling workflows as operators. It leverages Apache Arrow and Cylon for a zero-copy data plane, enabling direct interoperability between components like preprocessing, embedding, and vector retrieval. Experiments show AAFLOW achieves up to 4.64x pipeline speedup and 2.8x gains in embedding/upsert phases due to improved data flow, batching, and communication.
Agentic workflows can be sped up by 4.6x, not through faster LLMs, but by optimizing data flow and communication between components.
Agentic workflows in large language model systems integrate retrieval, reasoning, and memory, but existing frameworks suffer from scalability and reproducibility limitations due to fragmented data orchestration, serialization overhead, and non-deterministic execution. Although these frameworks increase flexibility, they don't have a formal execution model that adheres to the principles of high-performance computing. We introduce AAFLOW, a unified distributed runtime that creates communication-efficient execution plans by modeling agentic workflows as an operator abstraction. Using Apache Arrow and Cylon, AAFLOW creates a zero-copy data plane that allows direct interoperability between preprocessing, embedding, and vector retrieval without the need for serialization overhead. To lower coordination costs, it uses resource-deterministic scheduling and asynchronous batching. While retaining comparable LLM generation throughput, experimental results demonstrate up to 4.64 times pipeline speedup and 2.8 times gains in embedding and upsert phases. Rather than LLM inference acceleration, these advantages result from enhanced data flow, batching, and communication efficiency.