Mar 16, 2026arXiv:2603.15372

SKILLS: Structured Knowledge Injection for LLM-Driven Telecommunications Operations

AI Summary

The paper introduces SKILLS, a benchmark framework for evaluating LLM agents in executing telecommunications operations workflows via real API interfaces. The framework comprises 37 scenarios across 8 TM Forum Open API domains, using mock API servers and deterministic evaluation rubrics. Results show that augmenting LLMs with a SKILL.md document encoding domain knowledge consistently improves performance across multiple open-weight models, with MiniMax M2.5 achieving the highest score (81.1%).

Key Contribution

Injecting structured domain knowledge into LLMs boosts their ability to reliably execute telecommunications operations workflows by up to 18.9 percentage points.

Abstract

As telecommunications operators accelerate adoption of AI-enabled automation, a practical question remains unresolved: can general-purpose large language model (LLM) agents reliably execute telecom operations workflows through real API interfaces, or do they require structured domain guidance? We introduce SKILLS (Structured Knowledge Injection for LLM-driven Service Lifecycle operations), a benchmark framework comprising 37 telecom operations scenarios spanning 8 TM Forum Open API domains (TMF620, TMF621, TMF622, TMF628, TMF629, TMF637, TMF639, TMF724). Each scenario is grounded in live mock API servers with seeded production-representative data, MCP tool interfaces, and deterministic evaluation rubrics combining response content checks, tool-call verification, and database state assertions. We evaluate open-weight models under two conditions: baseline (generic agent with tool access but no domain guidance) and with-skill (agent augmented with a portable SKILL.md document encoding workflow logic, API patterns, and business rules). Results across 5 open-weight model conditions and 185 scenario-runs show consistent skill lift across all models. MiniMax M2.5 leads (81.1% with-skill, +13.5pp), followed by Nemotron 120B (78.4%, +18.9pp), GLM-5 Turbo (78.4%, +5.4pp), and Seed 2.0 Lite (75.7%, +18.9pp).

Eval Frameworks & Benchmarks Natural Language Processing Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SKILLS: Structured Knowledge Injection for LLM-Driven Telecommunications Operations

Related Papers