Search papers, labs, and topics across Lattice.
This paper investigates the feasibility of using ChatGPT to generate realistic synthetic system requirement specifications (SSyRSs) without access to real data. They employ a systematic approach involving prompt patterns, LLM-based quality assessments, and iterative prompt refinement to generate 300 SSyRSs across 10 industries. Results from an expert study (n=87) indicate that 62% of experts found the generated SSyRSs realistic, although in-depth examination revealed inconsistencies and deficiencies, highlighting the limitations of LLM-based quality assessments.
Despite ChatGPT's known flaws, it can generate surprisingly realistic synthetic system requirement specifications that fool experts more often than you'd expect.
System requirement specifications (SyRSs) are central, natural-language (NL) artifacts. Access to real SyRS for research purposes is highly valuable but limited by proprietary restrictions or confidentiality concerns. Generating synthetic SyRSs (SSyRSs) can address this scarcity. Black-box large language models (LLMs) such as ChatGPT offer compelling generation capabilities by providing easy access to NL generation functions without requiring access to real data. However, LLMs suffer from hallucinations and overconfidence, which pose major challenges in their use. We designed an exploratory study to investigate whether, despite these challenges, we can generate realistic SSyRSs with ChatGPT without having access to real SyRSs. Using a systematic approach that leverages prompt patterns, LLM-based quality assessments, and iterative prompt refinements, we generated 300 SSyRSs across 10 industries with ChatGPT. The results were evaluated using cross-model checks and an expert study, with n=87 submitted surveys. 62\% of experts considered the SSyRSs to be realistic. However, in-depth examination revealed contradictory statements and deficiencies. Overall, we were able to generate realistic SSyRSs to a certain extent with ChatGPT, but LLM-based quality assessments cannot fully replace thorough expert evaluations. This paper presents the methodology and results of our study and discusses the key insights we obtained.