Feb 26, 2026arXiv:2602.23336

Differentiable Zero-One Loss via Hypersimplex Projections

Camilo Gomez, Pengyang Wang, Pengyang Wang, Lian-Sha Tang, Liansheng Tang

AI Summary

This paper introduces a differentiable approximation to the zero-one loss function by constructing a smooth, order-preserving projection onto the n,k-dimensional hypersimplex. This projection, termed Soft-Binary-Argmax, allows for end-to-end training with a loss function that more closely aligns with classification accuracy. The method's Jacobian can be efficiently computed and integrated into learning systems, leading to improved generalization performance, particularly in large-batch training scenarios by imposing geometric consistency constraints on output logits.

Key Contribution

A differentiable zero-one loss approximation closes the generalization gap in large-batch training by imposing geometric consistency on output logits.

Abstract

Recent advances in machine learning have emphasized the integration of structured optimization components into end-to-end differentiable models, enabling richer inductive biases and tighter alignment with task-specific objectives. In this work, we introduce a novel differentiable approximation to the zero-one loss-long considered the gold standard for classification performance, yet incompatible with gradient-based optimization due to its non-differentiability. Our method constructs a smooth, order-preserving projection onto the n,k-dimensional hypersimplex through a constrained optimization framework, leading to a new operator we term Soft-Binary-Argmax. After deriving its mathematical properties, we show how its Jacobian can be efficiently computed and integrated into binary and multiclass learning systems. Empirically, our approach achieves significant improvements in generalization under large-batch training by imposing geometric consistency constraints on the output logits, thereby narrowing the performance gap traditionally observed in large-batch training.

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References22

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Differentiable Zero-One Loss via Hypersimplex Projections

Related Papers