Search papers, labs, and topics across Lattice.
The paper introduces Factorized Linear Projection (FLiP) models to analyze information encoded in multilingual, multimodal, and API-based sentence embedding spaces like LaBSE, SONAR, and Gemini. FLiP models are trained to reconstruct the original lexical content from the embeddings, achieving over 75% recall, significantly surpassing non-factorized baselines. By using FLiP as a diagnostic tool, the authors reveal modality and language biases within these encoders, offering intrinsic insights without relying on downstream tasks.
Multilingual and multimodal embeddings leak way more lexical information than you think – FLiP can recover 75% of the original text.
This paper presents factorized linear projection (FLiP) models for understanding pretrained sentence embedding spaces. We train FLiP models to recover the lexical content from multilingual (LaBSE), multimodal (SONAR) and API-based (Gemini) sentence embedding spaces in several high- and mid-resource languages. We show that FLiP can recall more than 75% of lexical content from the embeddings, significantly outperforming existing non-factorized baselines. Using this as a diagnostic tool, we uncover the modality and language biases across the selected sentence encoders and provide practitioners with intrinsic insights about the encoders without relying on conventional downstream evaluation tasks. Our implementation is public https://github.com/BUTSpeechFIT/FLiP.