Yu Zhang

Texas A&M University, Z_{t_{2}}, capturing both semantic content and uncertainty arising from ambiguity across views, report sections, or image-text relationships. 3.4 Model Training During training, the model computes probabilistic distances between all relevant pairs of distributions, including image-text pairs, image-image pairs, and text-text pairs, using Equation 2 to compare Gaussian embeddings. To prevent the model from predicting unbounded variances and to regularize the distributional space, we additionally apply a variational information bottleneck (VIB) penalty (Equation 3), computing KL divergence between each embedding distribution and a unit Gaussian prior. The final training objective is a weighted combination of inter-modal NLL, intra-modal NLL terms, and KL regularization: ℒtotal=\displaystyle\mathcal{L}_{\text{total}}= ℒinter+λIℒintra-I+λTℒintra-T+\displaystyle\mathcal{L}_{\text{inter}}+\lambda_{I}\mathcal{L}_{\text{intra-I}}+\lambda_{T}\mathcal{L}_{\text{intra-T}}+ (4) βIKLimg+βTKLtext,\displaystyle\beta_{I}\mathrm{KL}_{\text{img}}+\beta_{T}\mathrm{KL}_{\text{text}}, where ℒinter\mathcal{L}_{\text{inter}} inter-modal probabilistic NLL, averaged over the four image-text pairs, ℒintra-I\mathcal{L}_{\text{intra-I}} and ℒintra-T\mathcal{L}_{\text{intra-T}} are intra-modal symmetry losses between the image views and text inputs, KLimg\mathrm{KL}_{\text{img}} and KLtext\mathrm{KL}_{\text{text}} are variational information bottleneck (VIB) KL divergences for image and text embeddings, λI\lambda_{I}, λT\lambda_{T}, βI\beta_{I}, and βT\beta_{T} are weight scalars. This multi-view, multi-loss formulation provides richer supervision and produces probabilistic embeddings that are both semantically aligned and uncertainty-calibrated, ultimately improving cross-modal retrieval performance. 3.5 Implementation Details We implement MedProbCLIP in PyTorch [21] following the model architecture introduced previously. For image encoding, we use a ViT, Lingyi Cai is with the Research Center of 6G Mobile Communications, School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan, 430074, China, and also with the College of Computing and Data Science, Nanyang Technological University, Singapore (e-mail: lingyicai@hust.edu.cn).Yu Zhang and Tao Jiang are with the Research Center of 6G Mobile Communications, School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan, 430074, China (e-mail: yuzhang123@hust.edu.cn; Tao.jiang@ieee.org).Ruichen Zhang, Yinqiu Liu, and Dusit Niyato are with the College of Computing and Data Science, Nanyang Technological University, Singapore (e-mails: ruichen.zhang@ntu.edu.sg; yinqiu001@e.ntu.edu.sg; dniyato@ntu.edu.sg).Wei Ni is with the School of Engineering, Edith Cowan University, Perth, WA 6027, and the School of Computer Science and Engineering, University of New South Wales (UNSW), Sydney, NSW 2033, Australia (e-mail: Wei.Ni@ieee.org).Abbas Jamalipour is with the School of Electrical and Computer Engineering, University of Sydney, Australia, and with the Graduate School of Information Sciences, Tohoku University, Japan (e-mail: a.jamalipour@ieee.org)

Papers on Lattice

Total citations

Topics

h-index

Research focus

Eval Frameworks & Benchmarks (1)Recommendation & Information Retrieval (1)Red-Teaming & Adversarial Robustness (1)

Frequent co-authors

Zhisheng Qi (1)Utkarsh Sahu (1)Li Ma (1)Haoyu Han (1)

Papers (1)

Feb 10, 2026

Feb 10, 2026·also Adobe Research, Cisco AI Research, Florida State, Michigan State +4

Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation

A unified benchmark reveals the fragmented landscape of RAG security, highlighting vulnerabilities to knowledge-extraction attacks and paving the way for robust defense strategies.

Zhisheng Qi, Utkarsh Sahu, Li Ma +8

Eval Frameworks & Benchmarks Recommendation & Information Retrieval Red-Teaming & Adversarial Robustness

Search

Yu Zhang

Research focus

Frequent co-authors

Papers (1)