Search papers, labs, and topics across Lattice.
2
0
4
6
Distilling patch-text alignment knowledge from a teacher model to a student surprisingly *improves* the student's alignment beyond that of the teacher.
DINOv2's impressive unimodal performance doesn't translate to cross-modal understanding, but a simple training tweak can align embeddings across RGB, depth, and segmentation without sacrificing feature quality.