Search papers, labs, and topics across Lattice.
This study investigates the impact of statistical calibration on human-AI teaming, focusing on how calibration assumptions affect prediction responsibilities within the team. The authors analyze two frameworks: one that combines predictions from both human and AI, and another that delegates predictions to either party. Key findings reveal that while delegation maintains the calibration of predictors, it places a significant burden on the rejector meta-model to accurately assess the strengths of each team member, a challenge that intensifies with human expertise and unobservable information.
Delegating prediction tasks in human-AI teams may preserve calibration but imposes a daunting challenge on the rejector model to accurately assess expertise.
We study models for human-AI teaming through the lens of statistical calibration. We assume the team consists of an AI model and human -- both of which are calibrated with respect to some partitioning of the feature space -- and expose how the calibration assumptions propagate into the teaming framework. In particular, we consider frameworks that either (i) combine human and model predictions or (ii) delegate prediction responsibility to either a human or model. We show via theoretical and empirical results that existing methods for combination do not preserve the human's degree of calibration. Methods for delegation (by the very act of delegation) preserve calibration of the downstream predictors but shift the burden onto the rejector meta-model that decides who predicts. The rejector must be calibrated finely enough to locate where each member is superior, a demand that grows with the human's expertise and becomes unattainable when the human relies on information the system cannot observe.