Software developers relying on AI chatbots for building applications may end up using hallucinated packages, according to a new report from generative AI security startup Lasso Security.
Continuing research from last year, Lasso’s Bar Lanyado demonstrated once again how large language model (LLM) tools can be used to spread software packages that do not exist.
Threat actors, he warned last year, could learn the names of these hallucinated packages and create malicious ones with the same names that could be downloaded based on recommendations made by AI chatbots.
Scaling up the research, Lanyado asked four different models, namely GPT-3.5-Turbo, GPT-4, Gemini Pro (previously Bard), and Coral (Cohere), over 40,000 “how to” questions, using the Langchain framework for interaction.
To check the repetitiveness of hallucinations, the researcher used 20 questions with zero-shot hallucinations (the model recommended a hallucinated package in the first answer).
All chatbots delivered more than 20% of hallucinations, with Gemini peaking at 64.5% of hallucinations. The repetitiveness was of around 15%, with Cohere peaking at 24.2%.
The most worrying aspect of the research, however, is the fact that the researcher uploaded an empty package that has been downloaded over 30,000 times, based on AI recommendations. Furthermore, the same package was found to be used or recommended by several large companies.
“For instance, instructions for installing this package can be found in the README of a repository dedicated to research conducted by Alibaba,” the researcher explains.
This research, Lanyado notes, underlines once against the need for cross-verification when receiving uncertain answers from an LLM, especially regarding software packages.
Lanyado also advises developers to be cautious when relying on open source software, especially when encountering unfamiliar packages, urging them to verify that package’s repository and evaluate its community and engagement before using it.
“Also, consider the date it was published and be on the lookout for anything that appears suspicious. Before integrating the package into a production environment, it’s prudent to perform a comprehensive security scan,” Lanyado notes.
Related: ChatGPT Hallucinations Can Be Exploited to Distribute Malicious Code Packages
Related: Suspicious NuGet Package Harvesting Information From Industrial Systems
Related: Thousands of Code Packages Vulnerable to Repojacking Attacks