MBZUAITIIUNSWJun 15, 2026arXiv:2606.16368

Evaluating LLM Personalization via Semantic Constraint Verification

Xuran Li, Guanqin Zhang, Imran Razzak, Hakim Hacid, Eleanna Kafeza, Hao Xue, Flora D. Salim

AI Summary

This paper introduces Natural Language Inference Constraint Verification (NLICV), a novel framework for evaluating LLM personalization that addresses the limitations of existing evaluation methods. By mapping sentence meanings to truth-condition sets, NLICV categorizes LLM behaviors into four distinct modes and aligns closely with human annotations while significantly reducing evaluation latency and costs. The method achieves up to a 2100 times speedup in inference compared to traditional LLM-as-a-judge protocols, providing interpretable evidence for its evaluations through an ablation-based procedure.

Key Contribution

NLICV not only speeds up LLM personalization evaluation by up to 2100 times but also offers a clearer understanding of model behaviors beyond simple binary scoring.

Abstract

Current evaluation paradigms for Large Language Model (LLM) personalization rely heavily on brittle surface-matching metrics or computationally expensive LLM-as-a-judge protocols, both of which lack interpretability. To address these limitations, we introduce Natural Language Inference Constraint Verification (NLICV), a scalable, semantically invariant framework that maps sentence meanings to truth-condition sets to verify personalization constraints via a Natural Language Inference (NLI) model. Moving beyond binary scoring, NLICV categorizes LLM behaviors into four distinct modes: personalization, generalization, sycophancy, and failure. Extensive experiments demonstrate that NLICV aligns closely with human annotations while drastically reducing the latency and token costs associated with LLM judges (up to 2100 inference speedup). Finally, through an ablation-based procedure, NLICV pinpoints the exact sentences driving the constraint verification, yielding faithful, understandable evidence for its evaluations.

Eval Frameworks & Benchmarks

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Evaluating LLM Personalization via Semantic Constraint Verification

Related Papers