Apr 21, 2026arXiv:2604.19069

Product-of-Experts Training Reduces Dataset Artifacts in Natural Language Inference

AI Summary

The paper addresses the problem of Natural Language Inference (NLI) models overfitting to dataset-specific artifacts rather than learning genuine reasoning skills. They introduce Product-of-Experts (PoE) training, a method that reduces the influence of biased examples by downweighting them based on the confidence of biased models. PoE achieves a 4.71% reduction in bias reliance with minimal accuracy loss, demonstrating its effectiveness in mitigating dataset artifacts.

Key Contribution

NLI models can be significantly debiased with minimal accuracy loss by simply downweighting examples where biased models exhibit high confidence.

Abstract

Neural NLI models overfit dataset artifacts instead of truly reasoning. A hypothesis-only model gets 57.7% in SNLI, showing strong spurious correlations, and 38.6% of the baseline errors are the result of these artifacts. We propose Product-of-Experts (PoE) training, which downweights examples where biased models are overconfident. PoE nearly preserves accuracy (89.10% vs. 89.30%) while cutting bias reliance by 4.71% (bias agreement 49.85% to 45%). An ablation finds lambda = 1.5 that best balances debiasing and accuracy. Behavioral tests still reveal issues with negation and numerical reasoning.

Constitutional AI & AI Ethics Data Curation & Synthetic Data Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References24

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Product-of-Experts Training Reduces Dataset Artifacts in Natural Language Inference

Related Papers