Search papers, labs, and topics across Lattice.
Texas A&M University, Z_{t_{2}}, capturing both semantic content and uncertainty arising from ambiguity across views, report sections, or image-text relationships. 3.4 Model Training During training, the model computes probabilistic distances between all relevant pairs of distributions, including image-text pairs, image-image pairs, and text-text pairs, using Equation 2 to compare Gaussian embeddings. To prevent the model from predicting unbounded variances and to regularize the distributional space, we additionally apply a variational information bottleneck (VIB) penalty (Equation 3), computing KL divergence between each embedding distribution and a unit Gaussian prior. The final training objective is a weighted combination of inter-modal NLL, intra-modal NLL terms, and KL regularization: 鈩抰otal=\displaystyle\mathcal{L}_{\text{total}}= 鈩抜nter+位I鈥嬧剴intra-I+位T鈥嬧剴intra-T+\displaystyle\mathcal{L}_{\text{inter}}+\lambda_{I}\mathcal{L}_{\text{intra-I}}+\lambda_{T}\mathcal{L}_{\text{intra-T}}+ (4) 尾I鈥婯Limg+尾T鈥婯Ltext,\displaystyle\beta_{I}\mathrm{KL}_{\text{img}}+\beta_{T}\mathrm{KL}_{\text{text}}, where 鈩抜nter\mathcal{L}_{\text{inter}} inter-modal probabilistic NLL, averaged over the four image-text pairs, 鈩抜ntra-I\mathcal{L}_{\text{intra-I}} and 鈩抜ntra-T\mathcal{L}_{\text{intra-T}} are intra-modal symmetry losses between the image views and text inputs, KLimg\mathrm{KL}_{\text{img}} and KLtext\mathrm{KL}_{\text{text}} are variational information bottleneck (VIB) KL divergences for image and text embeddings, 位I\lambda_{I}, 位T\lambda_{T}, 尾I\beta_{I}, and 尾T\beta_{T} are weight scalars. This multi-view, multi-loss formulation provides richer supervision and produces probabilistic embeddings that are both semantically aligned and uncertainty-calibrated, ultimately improving cross-modal retrieval performance. 3.5 Implementation Details We implement MedProbCLIP in PyTorch [21] following the model architecture introduced previously. For image encoding, we use a ViT, Lingyi Cai is with the Research Center of 6G Mobile Communications, School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan, 430074, China, and also with the College of Computing and Data Science, Nanyang Technological University, Singapore (e-mail: lingyicai@hust.edu.cn).Yu Zhang and Tao Jiang are with the Research Center of 6G Mobile Communications, School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan, 430074, China (e-mail: yuzhang123@hust.edu.cn; Tao.jiang@ieee.org).Ruichen Zhang, Yinqiu Liu, and Dusit Niyato are with the College of Computing and Data Science, Nanyang Technological University, Singapore (e-mails: ruichen.zhang@ntu.edu.sg; yinqiu001@e.ntu.edu.sg; dniyato@ntu.edu.sg).Wei Ni is with the School of Engineering, Edith Cowan University, Perth, WA 6027, and the School of Computer Science and Engineering, University of New South Wales (UNSW), Sydney, NSW 2033, Australia (e-mail: Wei.Ni@ieee.org).Abbas Jamalipour is with the School of Electrical and Computer Engineering, University of Sydney, Australia, and with the Graduate School of Information Sciences, Tohoku University, Japan (e-mail: a.jamalipour@ieee.org)
1
0
3
1
A unified benchmark reveals the fragmented landscape of RAG security, highlighting vulnerabilities to knowledge-extraction attacks and paving the way for robust defense strategies.