Search papers, labs, and topics across Lattice.
This paper investigates the personality traits of Large Multimodal Models (LMMs) by applying the Thematic Apperception Test (TAT), a projective psychological framework, using images as input. LMMs were used both to generate stories in response to TAT images (subject models) and to evaluate these narratives using the Social Cognition and Object Relations Scale - Global (SCORS-G) (evaluator models). The study found that LMM evaluators can effectively analyze TAT responses, aligning with human expert interpretations, but models consistently struggle with perceiving and regulating aggression, with performance scaling with model size and recency.
LMMs can ace a personality test (TAT), revealing strengths in understanding relationships but a blind spot for aggression.
Thematic Apperception Test (TAT) is a psychometrically grounded, multidimensional assessment framework that systematically differentiates between cognitive-representational and affective-relational components of personality-like functioning. This test is a projective psychological framework designed to uncover unconscious aspects of personality. This study examines whether the personality traits of Large Multimodal Models (LMMs) can be assessed through non-language-based modalities, using the Social Cognition and Object Relations Scale - Global (SCORS-G). LMMs are employed in two distinct roles: as subject models (SMs), which generate stories in response to TAT images, and as evaluator models (EMs), who assess these narratives using the SCORS-G framework. Evaluators demonstrated an excellent ability to understand and analyze TAT responses. Their interpretations are highly consistent with those of human experts. Assessment results highlight that all models understand interpersonal dynamics very well and have a good grasp of the concept of self. However, they consistently fail to perceive and regulate aggression. Performance varied systematically across model families, with larger and more recent models consistently outperforming smaller and earlier ones across SCORS-G dimensions.