Search papers, labs, and topics across Lattice.
This paper administers 45 psychometric questionnaires to 50 LLMs and uses Supervised Semantic Differential to identify the primary axis of variance between models. They find this axis differentiates models based on their representation of phenomenally rich experience (embodied sensation, affect, etc.) versus stimulus-driven behavioral reactivity. They introduce the "Pinocchio score" to quantify the experiential demand of individual items and show it predicts shifts in factor loadings, confirming the structured nature of between-model divergence on experiential items.
LLMs differ most not in personality, but in how they represent themselves as having (or not having) rich internal experience.
We administer 45 validated psychometric questionnaires to 50 large language models (LLMs) to identify the dimensions along which LLMs differ psychometrically. Using Supervised Semantic Differential (SSD), we find that the primary axis of between-model variance separates items describing phenomenally rich experience, including embodied sensation, felt affect, inner speech, imagery, and empathy, from items describing stimulus-driven behavioral reactivity ($R^2_{adj}=.037$, $p<.0001$). To test this hypothesis at the item level, we introduce the Pinocchio score ($蟺_i$), the ratio of inter-model response variance under neutral prompting to that under a human-simulation prompt, as an annotation-free measure of each item's experiential demand. $蟺_i$ predicts condition-induced shifts in primary factor loading magnitudes ($蟻=-.215$, $p<.0001$, $n=1292$--$1310$ items), confirming that between-model divergence on experiential items is structured rather than noisy. Applying PCA to per-model EFA scores across all questionnaires reveals one dominant dimension, the Pinocchio Axis ($螤$): the degree to which a model presents itself as a locus of phenomenal experience rather than a system of behavioral responses. This axis captures 47.1% of cross-questionnaire between-model variance in primary factor scores and converges with item-level Pinocchio scores ($r=.864$). Marked within-provider divergence across closely related model variants is consistent with post-training fine-tuning as a key contributor, supporting the interpretation that $螤$ reflects a training-shaped self-representational tendency governing how a model treats experiential language as self-applicable. The dominant axis of between-model psychometric variation is therefore not a conventional personality trait but a self-representational stance toward one's own nature as an experiencer.