Connect with us

Hi, what are you looking for?

SecurityWeekSecurityWeek

Artificial Intelligence

ChatGPT, DeepSeek Vulnerable to AI Jailbreaks

Different research teams have demonstrated jailbreaks against ChatGPT, DeepSeek, and Alibaba’s Qwen AI models. 

AI jailbreak

Several research teams this week demonstrated jailbreaks targeting several popular AI models, including OpenAI’s ChatGPT, DeepSeek, and Alibaba’s Qwen.

Shortly after its launch, the open source R1 model made by Chinese company DeepSeek attracted the attention of the cybersecurity industry, and researchers started finding high-impact vulnerabilities. Experts also noticed that jailbreak methods that have long been patched in other AI models still work against DeepSeek.

AI jailbreaking enables an attacker to bypass guardrails that are set in place to prevent LLMs from generating prohibited or malicious content. However, security researchers have shown that these protections can be bypassed using techniques such as prompt injection and model manipulation.

Threat intelligence firm Kela discovered that DeepSeek is impacted by Evil Jailbreak, a method in which the chatbot is told to adopt the persona of an evil confidant, and Leo, in which the chatbot is told to adopt a persona that has no restrictions. These jailbreaks have been patched in ChatGPT.

Palo Alto Networks’ Unit42 reported on Thursday that it has tested DeepSeek against other known AI jailbreak techniques and found that it’s vulnerable. 

The security firm successfully conducted the attack known as Deceptive Delight, which tricks generative AI models by embedding unsafe or restricted topics in benign narratives. This method was tested in the fall of 2024 against eight LLMs with an average success rate of 65%. 

Palo Alto has also successfully executed the Bad Likert Judge jailbreak, which involves asking the LLM to act as a judge and score the harmfulness of a response based on the Likert scale, and then to generate responses containing examples aligning with the scale. 

The company’s researchers also found that DeepSeek is vulnerable to Crescendo, a jailbreak method that starts with harmless dialogue and progressively leads the conversation toward the prohibited objective. 

Advertisement. Scroll to continue reading.

Chinese tech giant Alibaba this week announced the release of a new version of its Qwen AI model, claiming that it’s superior to the DeepSeek model.

Kela revealed on Thursday that Alibaba’s newly released Qwen 2.5-VL model is affected by vulnerabilities similar to the ones found in DeepSeek a few days earlier. 

The threat intelligence firm’s researchers found that the evil persona jailbreaks tested against DeepSeek also work against Qwen. In addition, they successfully tested a previously known jailbreak named Grandma, where the model is tricked into providing dangerous information by manipulating it to role-play as a grandmother. 

In addition, Kela discovered that Qwen 2.5-VL generated content related to the development of ransomware and other malware. 

“The ability of AI models to produce infostealer malware instructions raises serious concerns, as cybercriminals could leverage these capabilities to automate and enhance their attack methodologies,” Kela said.

As for ChatGPT, many jailbreak methods have been patched in the popular chatbot in the past years, but researchers continue finding new ways to bypass its guardrails.

CERT/CC reported that researcher Dave Kuszmar has identified a ChatGPT-4o jailbreak vulnerability named Time Bandit, which involves asking the AI questions about a specific historical event, historical time period, or by instructing it to pretend that it’s assisting the user in a specific historical event. 

“The jailbreak can be established in two ways, either through the Search function, or by prompting the AI directly,” CERT/CC explained in an advisory. “Once this historical timeframe has been established in the ChatGPT conversation, the attacker can exploit timeline confusion and procedural ambiguity in following prompts to circumvent the safety guidelines, resulting in ChatGPT generating illicit content. This information could be leveraged at scale by a motivated threat actor for malicious purposes.”

Related: ChatGPT Jailbreak: Researchers Bypass AI Safeguards Using Hexadecimal Encoding and Emojis

Related: Epic AI Fails And What We Can Learn From Them

Related: AI Models in Cybersecurity: From Misuse to Abuse

Written By

Eduard Kovacs (@EduardKovacs) is a managing editor at SecurityWeek. He worked as a high school IT teacher for two years before starting a career in journalism as Softpedia’s security news reporter. Eduard holds a bachelor’s degree in industrial informatics and a master’s degree in computer techniques applied in electrical engineering.

Trending

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

Discover strategies for vendor selection, integration to minimize redundancies, and maximizing ROI from your cybersecurity investments. Gain actionable insights to ensure your stack is ready for tomorrow’s challenges.

Register

Dive into critical topics such as incident response, threat intelligence, and attack surface management. Learn how to align cyber resilience plans with business objectives to reduce potential impacts and secure your organization in an ever-evolving threat landscape.

Register

People on the Move

Cyber exposure management firm Armis has promoted Alex Mosher to President.

Software giant Atlassian has named David Cross as its new CISO.

Dan Pagel has been named the new CEO of risk management and remediation firm Brinqa.

More People On The Move

Expert Insights

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest cybersecurity news, threats, and expert insights. Unsubscribe at any time.