Search papers, labs, and topics across Lattice.
The paper introduces NearID, a novel framework and dataset to disentangle object identity from background context in vision encoders, which is crucial for reliable identity-focused tasks like personalized generation. They generate Near-identity (NearID) distractors by placing semantically similar but distinct instances on the same background as a reference image, forcing the encoder to focus solely on identity. By training with a two-tier contrastive objective that enforces a hierarchy of same identity > NearID distractor > random negative, they significantly improve identity discrimination, achieving a Sample Success Rate of 99.2% compared to the 30.7% of pre-trained encoders.
Pre-trained vision encoders are shockingly bad at distinguishing identity when background context is controlled, but a simple contrastive learning scheme can fix it.
When evaluating identity-focused tasks such as personalized generation and image editing, existing vision encoders entangle object identity with background context, leading to unreliable representations and metrics. We introduce the first principled framework to address this vulnerability using Near-identity (NearID) distractors, where semantically similar but distinct instances are placed on the exact same background as a reference image, eliminating contextual shortcuts and isolating identity as the sole discriminative signal. Based on this principle, we present the NearID dataset (19K identities, 316K matched-context distractors) together with a strict margin-based evaluation protocol. Under this setting, pre-trained encoders perform poorly, achieving Sample Success Rates (SSR), a strict margin-based identity discrimination metric, as low as 30.7% and often ranking distractors above true cross-view matches. We address this by learning identity-aware representations on a frozen backbone using a two-tier contrastive objective enforcing the hierarchy: same identity>NearID distractor>random negative. This improves SSR to 99.2%, enhances part-level discrimination by 28.0%, and yields stronger alignment with human judgments on DreamBench++, a human-aligned benchmark for personalization. Project page: https://gorluxor.github.io/NearID/