UVAMar 16, 2026arXiv:2603.15909

Prompt Engineering for Scale Development in Generative Psychometrics

AI Summary

This study investigates prompt engineering strategies for generating personality assessment items using LLMs within the AI-GENIE framework. They compared zero-shot, few-shot, persona-based, and adaptive prompting techniques across different LLMs and temperature settings, evaluating item pool quality before and after reduction using network psychometric methods. Adaptive prompting consistently outperformed other methods, particularly when paired with higher-capacity models, by reducing semantic redundancy and improving structural validity.

Key Contribution

Adaptive prompting unlocks superior LLM-generated personality assessments, outperforming traditional methods and scaling effectively with model capability.

Abstract

This Monte Carlo simulation examines how prompt engineering strategies shape the quality of large language model (LLM)--generated personality assessment items within the AI-GENIE framework for generative psychometrics. Item pools targeting the Big Five traits were generated using multiple prompting designs (zero-shot, few-shot, persona-based, and adaptive), model temperatures, and LLMs, then evaluated and reduced using network psychometric methods. Across all conditions, AI-GENIE reliably improved structural validity following reduction, with the magnitude of its incremental contribution inversely related to the quality of the incoming item pool. Prompt design exerted a substantial influence on both pre- and post-reduction item quality. Adaptive prompting consistently outperformed non-adaptive strategies by sharply reducing semantic redundancy, elevating pre-reduction structural validity, and preserving substantially larger item pool, particularly when paired with newer, higher-capacity models. These gains were robust across temperature settings for most models, indicating that adaptive prompting mitigates common trade-offs between creativity and psychometric coherence. An exception was observed for the GPT-4o model at high temperatures, suggesting model-specific sensitivity to adaptive constraints at elevated stochasticity. Overall, the findings demonstrate that adaptive prompting is the strongest approach in this context, and that its benefits scale with model capability, motivating continued investigation of model--prompt interactions in generative psychometric pipelines.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Natural Language Processing

Citation Metrics

Citations0

Influential citations0

References59

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Prompt Engineering for Scale Development in Generative Psychometrics

Related Papers