Search papers, labs, and topics across Lattice.
The paper addresses the problem of Natural Language Inference (NLI) models overfitting to dataset-specific artifacts rather than learning genuine reasoning skills. They introduce Product-of-Experts (PoE) training, a method that reduces the influence of biased examples by downweighting them based on the confidence of biased models. PoE achieves a 4.71% reduction in bias reliance with minimal accuracy loss, demonstrating its effectiveness in mitigating dataset artifacts.
NLI models can be significantly debiased with minimal accuracy loss by simply downweighting examples where biased models exhibit high confidence.
Neural NLI models overfit dataset artifacts instead of truly reasoning. A hypothesis-only model gets 57.7% in SNLI, showing strong spurious correlations, and 38.6% of the baseline errors are the result of these artifacts. We propose Product-of-Experts (PoE) training, which downweights examples where biased models are overconfident. PoE nearly preserves accuracy (89.10% vs. 89.30%) while cutting bias reliance by 4.71% (bias agreement 49.85% to 45%). An ablation finds lambda = 1.5 that best balances debiasing and accuracy. Behavioral tests still reveal issues with negation and numerical reasoning.