Search papers, labs, and topics across Lattice.
This study conducts a large-scale empirical analysis of hallucination patterns in Large Language Models (LLMs) when generating Rust code, focusing specifically on the issue of package hallucination that poses security risks to the software supply chain. By constructing a multi-source dataset from Stack Overflow, GitHub, and LLM-generated tasks, the researchers evaluate both commercial and open-source models, revealing that hallucination rates in Rust are distinct and consistent across different models, showing minimal sensitivity to model parameters. Additionally, the study explores prompt engineering strategies that effectively mitigate hallucinations while maintaining code quality, providing valuable insights for safer LLM deployment in software engineering.
Hallucination rates in LLM-generated Rust code are surprisingly consistent across models, challenging assumptions from previous studies in other languages.
Large Language Models (LLMs) have become powerful tools for code generation, yet they remain prone to hallucinations-producing plausible but incorrect or fabricated outputs. Among these, package hallucination, where an LLM suggests non-existent dependencies, poses an emerging security risk to the software supply chain. While previous studies focus on popular languages like Python or JavaScript, in this work we present the first large-scale empirical study on crate hallucination in LLM-generated Rust code. We construct a multi-source dataset combining coding tasks from Stack Overflow, GitHub, and LLM-generated tasks, and evaluate both commercial and open-source models under various decoding settings. Our analysis reveals that, unlike prior findings in Python and JavaScript, hallucination behavior in Rust follows a distinct pattern: different models exhibit surprisingly consistent hallucination rates, and these rates show minimal sensitivity to model parameters. Furthermore, we investigate prompt engineering strategies to mitigate hallucinations without sacrificing code quality. This study provides new insights into the reliability and security implications of LLM-assisted Rust development, offering guidance for future research and safer model deployment in software engineering workflows.