HKUSTSUSTechJun 4, 2026arXiv:2606.06178

Learning to Route LLMs from Implicit Cost-Performance Preferences via Meta-Learning

AI Summary

This paper introduces MetaRouter, a meta-learning framework that optimizes LLM routing based on users' implicit cost-performance preferences, addressing the limitations of existing methods that struggle with diverse user needs. By framing user preferences as distinct tasks within a contextual bandit setup, MetaRouter efficiently learns and adapts to these preferences with minimal interaction. Experimental results demonstrate that MetaRouter significantly outperforms strong baselines in both in-distribution and out-of-distribution scenarios, showcasing its robustness and scalability in multi-model routing environments.

Key Contribution

MetaRouter learns user preferences for LLM routing with minimal interaction, outperforming traditional methods in diverse scenarios.

Abstract

Large language models (LLMs) present a trade-off between performance and cost, where more powerful models incur greater expense. LLM routing aims to mitigate expenses while maintaining performance by sending queries to the most suitable model. However, existing methods cannot perform well for different user cost-performance preferences. To address this gap, we introduce a novel perceptive LLM routing paradigm for personalized and user-centric cost-performance optimization, which efficiently learns users' implicit preferences through little interaction. To handle the challenge of heterogeneous user needs, we formulate preference profiles as a set of distinct tasks in contextual bandit and propose MetaRouter, a meta-learning framework designed for preference-aware LLM routing. Experimental results show that MetaRouter outperforms strong baselines on both in-distribution and out-of-distribution tasks. Furthermore, it exhibits high efficiency in learning user preferences, robustness to changes in the routable LLMs, and scalability to multi-model routing.

RLHF & Preference Learning Scalable Oversight & Alignment Theory

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Learning to Route LLMs from Implicit Cost-Performance Preferences via Meta-Learning

Related Papers