Mar 31, 2026arXiv:2603.29953

Structured Intent as a Protocol-Like Communication Layer: Cross-Model Robustness, Framework Comparison, and the Weak-Model Compensation Effect

AI Summary

This paper investigates the robustness of structured intent representations, specifically the 5W3H-based PPS framework, across different LLMs (Claude, GPT-4o, Gemini), languages, and prompting frameworks (CO-STAR, RISEN). Results show that structured prompting significantly reduces cross-language score variance, with the strongest structured conditions reducing variance from 0.470 to approximately 0.020, and reveals a "weak-model compensation" effect where weaker models benefit more from structured prompting. A user study further demonstrates that AI-expanded 5W3H prompts reduce interaction rounds and increase user satisfaction, highlighting the practical utility of structured intent.

Key Contribution

Even Gemini can understand you if you speak its language: structured intent prompting slashes cross-language performance variance and boosts weaker models more than stronger ones.

Abstract

How reliably can structured intent representations preserve user goals across different AI models, languages, and prompting frameworks? Prior work showed that PPS (Prompt Protocol Specification), a 5W3H-based structured intent framework, improves goal alignment in Chinese and generalizes to English and Japanese. This paper extends that line of inquiry in three directions: cross-model robustness across Claude, GPT-4o, and Gemini 2.5 Pro; controlled comparison with CO-STAR and RISEN; and a user study (N=50) of AI-assisted intent expansion in ecologically valid settings. Across 3,240 model outputs (3 languages x 6 conditions x 3 models x 3 domains x 20 tasks), evaluated by an independent judge (DeepSeek-V3), we find that structured prompting substantially reduces cross-language score variance relative to unstructured baselines. The strongest structured conditions reduce cross-language sigma from 0.470 to about 0.020. We also observe a weak-model compensation pattern: the lowest-baseline model (Gemini) shows a much larger D-A gain (+1.006) than the strongest model (Claude, +0.217). Under the current evaluation resolution, 5W3H, CO-STAR, and RISEN achieve similarly high goal-alignment scores, suggesting that dimensional decomposition itself is an important active ingredient. In the user study, AI-expanded 5W3H prompts reduce interaction rounds by 60 percent and increase user satisfaction from 3.16 to 4.04. These findings support the practical value of structured intent representation as a robust, protocol-like communication layer for human-AI interaction.

Eval Frameworks & Benchmarks Natural Language Processing Red-Teaming & Adversarial Robustness

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Structured Intent as a Protocol-Like Communication Layer: Cross-Model Robustness, Framework Comparison, and the Weak-Model Compensation Effect

Related Papers