Search papers, labs, and topics across Lattice.
This paper investigates the effectiveness of static analysis for detecting and mitigating library hallucination in LLM-generated code. They find that static analysis tools can detect a significant portion (16-70% of all errors, 14-85% of library hallucinations), but are fundamentally limited in their ability to catch all hallucinations due to the dynamic nature of some errors. The study establishes an upper bound on the potential of static analysis, showing it can only address 48.5-77% of hallucinations.
Static analysis, a cheap and readily available technique, can catch up to 85% of library hallucinations in LLM-generated code, but a ceiling exists beyond which it cannot improve.
Despite extensive research, Large Language Models continue to hallucinate when generating code, particularly when using libraries. On NL-to-code benchmarks that require library use, we find that LLMs generate code that uses non-existent library features in 8.1-40% of responses.One intuitive approach for detection and mitigation of hallucinations is static analysis. In this paper, we analyse the potential of static analysis tools, both in terms of what they can solve and what they cannot. We find that static analysis tools can detect 16-70% of all errors, and 14-85% of library hallucinations, with performance varying by LLM and dataset. Through manual analysis, we identify cases a static method could not plausibly catch, which gives an upper bound on their potential from 48.5% to 77%. Overall, we show that static analysis methods are cheap method for addressing some forms of hallucination, and we quantify how far short of solving the problem they will always be.