NUSNTUOxfordSEUFeb 19, 2026arXiv:2602.17144

When More Experts Hurt: Underfitting in Multi-Expert Learning to Defer

Shuqi Liu, Yuzhou Cao, Lei Feng, Bo An, Luke Ong

AI Summary

This paper investigates the underfitting problem in multi-expert Learning to Defer (L2D) systems, demonstrating that it is more inherent and detrimental than in single-expert L2D due to expert identifiability issues. The authors theoretically show that the difficulty arises from the classifier's inability to discern which expert to trust from a diverse pool, hindering effective learning. To address this, they propose PiCCE (Pick the Confident and Correct Expert), a surrogate-based method that adaptively identifies a reliable expert, effectively reducing the multi-expert problem to a single-expert-like scenario and mitigating underfitting.

Key Contribution

Multi-expert systems can suffer from *worse* performance than single-expert systems due to an inherent underfitting problem that arises from the difficulty of identifying the correct expert to defer to.

Abstract

Learning to Defer (L2D) enables a classifier to abstain from predictions and defer to an expert, and has recently been extended to multi-expert settings. In this work, we show that multi-expert L2D is fundamentally more challenging than the single-expert case. With multiple experts, the classifier's underfitting becomes inherent, which seriously degrades prediction performance, whereas in the single-expert setting it arises only under specific conditions. We theoretically reveal that this stems from an intrinsic expert identifiability issue: learning which expert to trust from a diverse pool, a problem absent in the single-expert case and renders existing underfitting remedies failed. To tackle this issue, we propose PiCCE (Pick the Confident and Correct Expert), a surrogate-based method that adaptively identifies a reliable expert based on empirical evidence. PiCCE effectively reduces multi-expert L2D to a single-expert-like learning problem, thereby resolving multi expert underfitting. We further prove its statistical consistency and ability to recover class probabilities and expert accuracies. Extensive experiments across diverse settings, including real-world expert scenarios, validate our theoretical results and demonstrate improved performance.

Architecture Design (Transformers, SSMs, MoE)Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

When More Experts Hurt: Underfitting in Multi-Expert Learning to Defer

Related Papers