Search papers, labs, and topics across Lattice.
Xi鈥檃n Jiaotong University
2
0
3
0
LVLMs leak visual text style into semantic inference, meaning the font of a word can change the attributes a model associates with the concept it represents.
Adversarially finetuning CLIP using a pretraining-inspired recipe with web data and feature regularization yields significantly better zero-shot robustness across diverse datasets than standard adversarial training.