Artificial Intelligence

ChatGPT, DeepSeek Vulnerable to AI Jailbreaks

Different research teams have demonstrated jailbreaks against ChatGPT, DeepSeek, and Alibaba’s Qwen AI models.

| January 31, 2025 (6:16 AM ET)

Several research teams this week demonstrated jailbreaks targeting several popular AI models, including OpenAI’s ChatGPT, DeepSeek, and Alibaba’s Qwen.

Shortly after its launch, the open source R1 model made by Chinese company DeepSeek attracted the attention of the cybersecurity industry, and researchers started finding high-impact vulnerabilities. Experts also noticed that jailbreak methods that have long been patched in other AI models still work against DeepSeek.

AI jailbreaking enables an attacker to bypass guardrails that are set in place to prevent LLMs from generating prohibited or malicious content. However, security researchers have shown that these protections can be bypassed using techniques such as prompt injection and model manipulation.

Threat intelligence firm Kela discovered that DeepSeek is impacted by Evil Jailbreak, a method in which the chatbot is told to adopt the persona of an evil confidant, and Leo, in which the chatbot is told to adopt a persona that has no restrictions. These jailbreaks have been patched in ChatGPT.

Palo Alto Networks’ Unit42 reported on Thursday that it has tested DeepSeek against other known AI jailbreak techniques and found that it’s vulnerable.

The security firm successfully conducted the attack known as Deceptive Delight, which tricks generative AI models by embedding unsafe or restricted topics in benign narratives. This method was tested in the fall of 2024 against eight LLMs with an average success rate of 65%.

Advertisement. Scroll to continue reading.

Palo Alto has also successfully executed the Bad Likert Judge jailbreak, which involves asking the LLM to act as a judge and score the harmfulness of a response based on the Likert scale, and then to generate responses containing examples aligning with the scale.

The company’s researchers also found that DeepSeek is vulnerable to Crescendo, a jailbreak method that starts with harmless dialogue and progressively leads the conversation toward the prohibited objective.

Chinese tech giant Alibaba this week announced the release of a new version of its Qwen AI model, claiming that it’s superior to the DeepSeek model.

Kela revealed on Thursday that Alibaba’s newly released Qwen 2.5-VL model is affected by vulnerabilities similar to the ones found in DeepSeek a few days earlier.

The threat intelligence firm’s researchers found that the evil persona jailbreaks tested against DeepSeek also work against Qwen. In addition, they successfully tested a previously known jailbreak named Grandma, where the model is tricked into providing dangerous information by manipulating it to role-play as a grandmother.

In addition, Kela discovered that Qwen 2.5-VL generated content related to the development of ransomware and other malware.

“The ability of AI models to produce infostealer malware instructions raises serious concerns, as cybercriminals could leverage these capabilities to automate and enhance their attack methodologies,” Kela said.

As for ChatGPT, many jailbreak methods have been patched in the popular chatbot in the past years, but researchers continue finding new ways to bypass its guardrails.

CERT/CC reported that researcher Dave Kuszmar has identified a ChatGPT-4o jailbreak vulnerability named Time Bandit, which involves asking the AI questions about a specific historical event, historical time period, or by instructing it to pretend that it’s assisting the user in a specific historical event.

“The jailbreak can be established in two ways, either through the Search function, or by prompting the AI directly,” CERT/CC explained in an advisory. “Once this historical timeframe has been established in the ChatGPT conversation, the attacker can exploit timeline confusion and procedural ambiguity in following prompts to circumvent the safety guidelines, resulting in ChatGPT generating illicit content. This information could be leveraged at scale by a motivated threat actor for malicious purposes.”

Written By Eduard Kovacs

Eduard Kovacs (@EduardKovacs) is senior managing editor at SecurityWeek. He worked as a high school IT teacher before starting a career in journalism in 2011. Eduard holds a bachelor’s degree in industrial informatics and a master’s degree in computer techniques applied in electrical engineering.

Latest News

Webinar: How Modern Breaches Bypass MFA and Evade Detection

June 17, 2026

Today’s attackers are no longer breaking in — they’re logging in. Join this live webinar as we break down the modern identity attack chain and examine how recent breaches exploited weaknesses in authentication, identity verification, and access management processes.

Webinar: Modern Exposure Validation in the AI Era

June 24, 2026

AI has accelerated both sides of the fight. Adversaries are weaponizing vulnerabilities faster, while defenders are racing to ship detections and configurations. Join this live webinar as we explore how to prove your controls actually hold against new threats, map your security maturity, and unite breach simulation with automated pentesting into a single, coordinated program.

SECURITYWEEK NETWORK:

ICS:

SecurityWeek

Artificial Intelligence

ChatGPT, DeepSeek Vulnerable to AI Jailbreaks

More from Eduard Kovacs

Latest News

Trending

Webinar: How Modern Breaches Bypass MFA and Evade Detection

Webinar: Modern Exposure Validation in the AI Era

People on the Move

Expert Insights

After AI Reaches Production: 12 Ways Security Teams Can Take Control

Everybody Is Vibe Coding But Nobody Told the Security Team

The Zero-Knowledge Threat Actor and the End of Responsible Disclosure

Raising the Cybersecurity Stakes: Ante up for the Agentic Era

Caught Off Guard: Securing AI After It Hits Production

SECURITYWEEK NETWORK:

ICS:

Daily Briefing Newsletter

More from Eduard Kovacs

Latest News

Trending

Daily Briefing Newsletter

Webinar: How Modern Breaches Bypass MFA and Evade Detection

Webinar: Modern Exposure Validation in the AI Era

People on the Move

Expert Insights

After AI Reaches Production: 12 Ways Security Teams Can Take Control

Everybody Is Vibe Coding But Nobody Told the Security Team

The Zero-Knowledge Threat Actor and the End of Responsible Disclosure

Raising the Cybersecurity Stakes: Ante up for the Agentic Era

Caught Off Guard: Securing AI After It Hits Production

Daily Briefing Newsletter