Apr 8, 2026arXiv:2604.07148

Multi-Turn Reasoning LLMs for Task Offloading in Mobile Edge Computing

AI Summary

This paper introduces COMLLM, a generative framework leveraging LLMs for task offloading in Mobile Edge Computing (MEC) to address the limitations of conventional heuristics and DRL methods. COMLLM uses Group Relative Policy Optimization (GRPO) integrated with a Look-Ahead Collaborative Simulation (LACS) mechanism that performs multi-step Monte Carlo rollouts to model server queue dynamics. The framework achieves near-optimal latency, improved load-balancing fairness, and zero-shot topological scalability, outperforming SFT, DRL, and heuristic baselines.

Key Contribution

LLMs can now handle complex task offloading in mobile edge computing with near-optimal latency and zero-shot generalization to unseen network topologies, a feat previously unattainable by DRL or heuristics.

Abstract

Emerging computation-intensive applications impose stringent latency requirements on resource-constrained mobile devices. Mobile Edge Computing (MEC) addresses this challenge through task offloading. However, designing effective policies remains difficult due to dynamic task arrivals, time-varying channels, and the spatio-temporal coupling of server queues. Conventional heuristics lack adaptability, while Deep Reinforcement Learning (DRL) suffers from limited generalization and architectural rigidity, requiring retraining when network topology changes. Although Large Language Models (LLMs) offer semantic reasoning capabilities, standard Supervised Fine-Tuning (SFT) yields myopic policies that greedily minimize immediate latency without accounting for long-term system evolution. To address these limitations, we propose COMLLM, a generative framework that enables foresighted decision-making in MEC systems. COMLLM integrates Group Relative Policy Optimization (GRPO) with a Look-Ahead Collaborative Simulation (LACS) mechanism, which performs multi-step Monte Carlo rollouts while jointly modeling server queue dynamics. By incorporating these rollouts into the reward design, the framework captures the long-term impact of current decisions on future system states. Experimental results demonstrate that COMLLM achieves near-optimal latency and improved load-balancing fairness. Notably, it exhibits zero-shot topological scalability, allowing a model trained on small-scale networks to generalize to larger, unseen topologies without retraining, outperforming SFT, DRL, and heuristic baselines.

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Multi-Turn Reasoning LLMs for Task Offloading in Mobile Edge Computing

Related Papers