Search papers, labs, and topics across Lattice.
The paper introduces MeFEm, a modified Joint Embedding Predictive Architecture (JEPA) model tailored for medical and biometric analysis of facial images. The model incorporates axial stripe masking, circular loss weighting, and probabilistic CLS token reassignment to improve performance on anthropometric tasks. MeFEm achieves state-of-the-art results on anthropometric tasks compared to FaRL and Franca, while using less data, and demonstrates promising BMI estimation on a new, consolidated dataset.
A novel facial analysis model, MeFEm, achieves state-of-the-art anthropometric performance with less data by strategically masking facial regions and re-weighting loss functions.
We present MeFEm, a vision model based on a modified Joint Embedding Predictive Architecture (JEPA) for biometric and medical analysis from facial images. Key modifications include an axial stripe masking strategy to focus learning on semantically relevant regions, a circular loss weighting scheme, and the probabilistic reassignment of the CLS token for high quality linear probing. Trained on a consolidated dataset of curated images, MeFEm outperforms strong baselines like FaRL and Franca on core anthropometric tasks despite using significantly less data. It also shows promising results on Body Mass Index (BMI) estimation, evaluated on a novel, consolidated closed-source dataset that addresses the domain bias prevalent in existing data. Model weights are available at https://huggingface.co/boretsyury/MeFEm , offering a strong baseline for future work in this domain.