Search papers, labs, and topics across Lattice.
SKLP, Institute of Computing Technology, Chinese Academy of Sciences, Hong Kong Polytechnic University, K [16]. Following DeiT [72], we develop three variants of BinaryAttention, namely -T (tiny), -S (small) and -B (base), by substituting all standard attention modules with BinaryAttention. We follow the experimental settings in DeiT [72], which are detailed in supplementary file. The models are fine-tuned with the self-distillation [34] strategy, where the full-precision counterparts serve as the teacher. We compare with quantization based methods PTQ, ×\times and 1.
1
0
3
1
Achieve >97.5% of full-data VIT performance with only 16% of the data using ScalSelect, a surprisingly effective and scalable training-free data selection method.