Search papers, labs, and topics across Lattice.
Hanoi University of Science and Technology
1
0
3
Achieve superior knowledge distillation by warping time and space: DWA-KD aligns teacher and student tokenizers with Soft-DTW and entropy-weighted KL divergence, outperforming SOTA methods.