Search papers, labs, and topics across Lattice.
The Hong Kong University of Science and Technology (
2
0
5
Cross-tokenizer On-Policy Distillation achieves superior efficiency and flexibility, enabling knowledge transfer between diverse model families without the constraints of shared tokenizers.
MLLM training gets a 1.36x speed boost with Dynamic Hybrid Parallelism (DHP), which adaptively optimizes parallelism strategies to handle the data heterogeneity that plagues multimodal datasets.