Intel LabsK on Terminal Bench and 52.K versus 72.May 25, 2026arXiv:2605.26297

Agentic AI Workload Characteristics

Yichao Yuan, Ankita Nayak, Souvik Kundu, Nishil Talati

AI Summary

This paper characterizes the workload properties of ReAct-style agents by tracing end-to-end executions on Gemma and Qwen across five agentic benchmarks. The study reveals that agentic workloads are decode-dominated due to effective context caching, leading to high KV-cache utilization. Furthermore, tool use exhibits a temporal structure, transitioning from exploratory actions to execution-oriented actions over time.

Key Contribution

Agentic workloads aren't just long prompts; they're decode-bound beasts with a tool-use personality arc, demanding a rethink of LLM serving infrastructure.

Abstract

Agentic AI shifts LLM serving from isolated prompt-generation requests to stateful, multi-turn executions that repeatedly invoke the model, call tools, and grow context over time. This paper characterizes ReAct-style agents from both the LLM-serving and tool-execution perspectives using an end-to-end tracing infrastructure across reasoning and non-reasoning Gemma and Qwen configurations on five agentic benchmarks. Our study shows that agentic workloads are not simply long-prompt workloads: with effective context caching, most input tokens are reused across turns, making execution decode-dominated while increasing dependence on long-lived KV-cache state. We also find that tool use has a clear temporal structure, with agents shifting from read/explore behavior early in execution to execute/write behavior later. These results show that efficient agentic serving must jointly manage repeated model re-entry, persistent context state, and workload-dependent tool behavior.

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Agentic AI Workload Characteristics

Related Papers