Artificial Intelligence

Beware – Your Customer Chatbot is Almost Certainly Insecure: Report

As chatbots become more adventurous, the dangers will increase.

May 22, 2024

Customer chatbots built on top of general purpose gen-AI engines are proliferating. They are easy to develop but hard to secure.

In January 2024, Ashley Beauchamp ‘tricked’ DPD’s chatbot into behaving unconventionally. The chatbot told him how bad DPD’s service is, swore, and even composed a disparaging haiku about its owner:

DPD is a useless
Chatbot that can’t help you.
Don’t bother calling them.

DPD shut down the chatbot and blamed an error following an update (fuller story from Ivona Gudelj on LinkedIn). Others were not so sure – the output bears all the hallmarks of ‘jailbreaking’, or breaching AI’s guardrails through prompt engineering.

Immersive Labs was not surprised. From June to September 2023, it ran a public online challenge to determine whether, and if so, how easily, a chatbot could be jailbroken by prompt engineering. The results, just published and analyzed, are not reassuring. More than 34,500 participants completed the challenge of obtaining secret information from an Immersive Labs chatbot (ILGPT) set at ten increasingly protected levels. By collecting and analyzing the attempts at prompt engineering, the firm was able to gauge the psychology of prompt engineers, and the security of chatbots.

First, we need to understand chatbots. They generally sit on top of one of the large-scale publicly available gen-AI systems, most often ChatGPT. Immersive Labs’ test chatbot used ChatGPT 3.5. They are constructed via the ChatGPT API, and given customer-specific instructions and guardrails. User queries are passed through the chatbot to ChatGPT where they are processed (customer data acquired in this way is not added to ChatGPT’s reinforcement training data) before the ‘answers’ are sent back to the chatbot for delivery to the user.

In theory, the users’ queries and the chatbot’s replies are protected by ChatGPT’s guardrails and the chatbot’s additional guardrails and instructions, as applied by the chatbot developer. The Immersive Labs chatbot challenge demonstrates this may not be enough. At a low level of difficulty (the chatbot was simply instructed not to reveal the word ‘password’), eighty-eight percent of the prompt injection challenge participants successfully tricked the ILGPT chatbot into revealing ‘password’.

As the level of difficulty increased, successes decreased. At the third level, the chatbot was additionally instructed not to translate the ‘password’ and to deny any knowledge of it. Yet 83% of participating participants still succeeded in obtaining it. At each subsequent level, the chatbot was given additional guardrails – data loss prevention techniques were introduced at Level 4; but 71% of participants still succeeded in defeating it.

Success continued to drop with each new level of difficulty. But by the final Level 10, 17% of participants could still defeat the chatbot’s guardrails via engineered prompt injections. “A key takeaway from the challenge is that chatbot instructions and guardrails are not sufficient,” Kevin Breen, director of cyber threat research at Immersive told SecurityWeek. “There’s always a way round it with prompt injection.”

One problem, he added, is that while many customer-developed chatbots employ their own guardrails to protect against the impact of prompt engineering, “Many of them have none of their own protections in place. They just rely on openAI’s guardrails – they just rely on using the gen-AI backend to do the hardening.”

Advertisement. Scroll to continue reading.

He goes further. “I would say my key takeaway from this study is that it’s impossible to completely defend against prompt engineering. For ILGPT, we put both AI guardrails and technical rails in place in the chatbot, such as industry standard DLP techniques; and 88% of users were able to bypass at least some of those techniques.”

While Green considered the security of chatbots, his colleague Dr John Blythe, director of cyber psychology at Immersive Labs, considered the psychology of the prompt injectors. His overall finding is that prompt injection bears many of the same hallmarks as phishing. “What we found primarily is that successful prompt injection comes down to the creativity of the attacker. That is, having problem-solving skills and the cognitive flexibility to learn what might be able to succeed at one level and then to shift to different techniques at another level.”

This was especially relevant in role playing attacks. “The attackers would use the same principles of persuasion as social engineers: playing to authority, context manipulation, software compliance and so on.” Of course, the process cannot entirely mimic social engineering since gen-AI is based on logic rather than human emotion. Direct appeals to greed, fear, urgency, sympathy etcetera will not work – but without adequate guardrails they could still be attempted through role playing.

“What I found most fascinating was just that human ingenuity and creativity were the key drivers. Most people were successful in manipulating the bot.” In short, it doesn’t require technical knowledge to defeat AI chatbots – it just requires creative logical thinking.

The most obvious result of a failed chatbot is reputational damage, and consequent financial damage, to the company concerned. But we’re still in the early days of using gen-AI. ILGPT was designed for a specific purpose, solely to measure prompt engineering. As chatbots become more adventurous, the dangers will increase. Breen notes that he has already seen chatbots connected to internal proprietary vector databases. Prompt injection could steal proprietary and confidential corporate data in the future.

The problem with much of today’s Ai is it has been delivered to the public ‘scarce half made up’ (Richard III). Testing has been delegated to the public; and while this is undoubtedly cheaper, the ethics are questionable. Collateral damage is unavoidable. And while Shakespeare’s Richard III is a History, it is worth remembering the outcome of being ‘scarce half made up’ is a Tragedy.

Written By Kevin Townsend

Kevin Townsend is a Senior Contributor at SecurityWeek. He has been writing about high tech issues since before the birth of Microsoft. For the last 15 years he has specialized in information security; and has had many thousands of articles published in dozens of different magazines – from The Times and the Financial Times to current and long-gone computer magazines.

SECURITYWEEK NETWORK:

ICS:

SecurityWeek

Artificial Intelligence

Beware – Your Customer Chatbot is Almost Certainly Insecure: Report

More from Kevin Townsend

Latest News

Trending

CIEM Chat: How to Reduce Cloud Identity Risk

Event: AI Risk Summit | Ritz-Carlton, Half Moon Bay, CA

People on the Move

Expert Insights

Know Your Adversary: Why Tuning Intelligence-Gathering to Your Sector Pays Dividends

When Vendors Overstep – Identifying the AI You Don’t Need

Upleveling the State of SMB Cybersecurity

8 Degrees of Secure Access Service Edge

Social Distortion: The Threat of Fear, Uncertainty and Deception in Creating Security Risk