Search papers, labs, and topics across Lattice.
This paper investigates whether training neural networks across varying activation sparsity levels improves generalization. They introduce a training strategy that uses global top-k constraints on hidden activations, cycling a single model through different sparsity levels via progressive compression and periodic resets. Experiments on CIFAR-10 with a WRN-28-4 architecture demonstrate that adaptive keep-ratio control strategies outperform dense training, suggesting that joint training across multiple sparsity regimes enhances generalization.
Forcing networks to perform well under varying sparsity constraints during training can surprisingly boost generalization, outperforming standard dense training.
Generalization in deep neural networks remains only partially understood. Inspired by the stronger generalization tendency of biological systems, we explore the hypothesis that robust internal representations should remain effective across both dense and sparse activation regimes. To test this idea, we introduce a simple training strategy that applies global top-k constraints to hidden activations and repeatedly cycles a single model through multiple activation budgets via progressive compression and periodic reset. Using CIFAR-10 without data augmentation and a WRN-28-4 backbone, we find in single-run experiments that two adaptive keep-ratio control strategies both outperform dense baseline training. These preliminary results suggest that joint training across multiple activation sparsity regimes may provide a simple and effective route to improved generalization.