Search papers, labs, and topics across Lattice.
This paper introduces COMLLM, a generative framework leveraging LLMs for task offloading in Mobile Edge Computing (MEC) to address the limitations of conventional heuristics and DRL methods. COMLLM uses Group Relative Policy Optimization (GRPO) integrated with a Look-Ahead Collaborative Simulation (LACS) mechanism that performs multi-step Monte Carlo rollouts to model server queue dynamics. The framework achieves near-optimal latency, improved load-balancing fairness, and zero-shot topological scalability, outperforming SFT, DRL, and heuristic baselines.
LLMs can now handle complex task offloading in mobile edge computing with near-optimal latency and zero-shot generalization to unseen network topologies, a feat previously unattainable by DRL or heuristics.
Emerging computation-intensive applications impose stringent latency requirements on resource-constrained mobile devices. Mobile Edge Computing (MEC) addresses this challenge through task offloading. However, designing effective policies remains difficult due to dynamic task arrivals, time-varying channels, and the spatio-temporal coupling of server queues. Conventional heuristics lack adaptability, while Deep Reinforcement Learning (DRL) suffers from limited generalization and architectural rigidity, requiring retraining when network topology changes. Although Large Language Models (LLMs) offer semantic reasoning capabilities, standard Supervised Fine-Tuning (SFT) yields myopic policies that greedily minimize immediate latency without accounting for long-term system evolution. To address these limitations, we propose COMLLM, a generative framework that enables foresighted decision-making in MEC systems. COMLLM integrates Group Relative Policy Optimization (GRPO) with a Look-Ahead Collaborative Simulation (LACS) mechanism, which performs multi-step Monte Carlo rollouts while jointly modeling server queue dynamics. By incorporating these rollouts into the reward design, the framework captures the long-term impact of current decisions on future system states. Experimental results demonstrate that COMLLM achieves near-optimal latency and improved load-balancing fairness. Notably, it exhibits zero-shot topological scalability, allowing a model trained on small-scale networks to generalize to larger, unseen topologies without retraining, outperforming SFT, DRL, and heuristic baselines.