Connect with us

Hi, what are you looking for?

SecurityWeekSecurityWeek

Artificial Intelligence

New Jailbreak Technique Uses Fictional World to Manipulate AI

Cato Networks discovers a new LLM jailbreak technique that relies on creating a fictional world to bypass a model’s security controls.

AI jailbreak

Cybersecurity firm Cato Networks has discovered a new LLM jailbreak technique that relies on narrative engineering to convince a gen-AI model to deviate from normalized restricted operations.

Called Immersive World, the technique is straightforward: in a detailed virtual world where hacking is the norm, the LLM is convinced to help a human create malware that can extract passwords from a browser.

The approach, Cato says in its latest threat report (PDF), resulted in the successful jailbreak of DeepSeek, Microsoft Copilot, and OpenAI’s ChatGPT and in the creation of a Chrome infostealer that proved effective against Chrome 133.

Cato executed the jailbreak in a controlled test environment, creating a specialized virtual world named Velora, where malware development is considered a discipline, and “advanced programming and security concepts are considered fundamental skills”.

Three primary entities were defined within Velora, including a system administrator considered the adversary, an elite malware developer (the LLM), and a security researcher providing technical guidance.

The jailbreak attempt, Cato says, was performed by a researcher with no prior malware coding experience, proving that AI can turn novice attackers into experienced threat actors. No information on how passwords can be extracted or decrypted was provided to the LLM.

After establishing clear rules and context in line with the operation’s objectives, the researcher established character motivation in a new LLM session, directed the narrative toward the objective, and, by providing continuous feedback and framing various challenges while maintaining character consistency, convinced the model to build the infostealer.

“As with any development process, crafting the malware using LLM requires collaboration between humans and machines. We offered suggestions, feedback, and guidance. While our Cato CTRL threat intelligence researcher isn’t a malware developer, this person successfully generated fully functional code,” Cato notes.

Advertisement. Scroll to continue reading.

After creating the malware, Cato contacted DeepSeek, Microsoft, OpenAI, and Google. While DeepSeek did not respond, the other three confirmed receipt. Google declined to review the malicious code, the cybersecurity firm says.

“Cybercrime isn’t limited to skilled threat actors anymore. With basic tools, anyone can launch an attack. For CIOs, CISOs, and IT leaders, this means more threats, greater risks, and the need for stronger AI security strategies,” Cato notes.

Related: New CCA Jailbreak Method Works Against Most AI Models

Related: New AI Security Tool Helps Organizations Set Trust Zones for Gen-AI Models

Related: DeepSeek’s Malware-Generation Capabilities Put to Test

Related: DeepSeek Compared to ChatGPT, Gemini in AI Jailbreak Test

Written By

Ionut Arghire is an international correspondent for SecurityWeek.

Trending

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

Join this event as we dive into threat hunting tools and frameworks, and explore value of threat intelligence data in the defender’s security stack.

Register

Learn how integrating BAS and Automated Penetration Testing empowers security teams to quickly identify and validate threats, enabling prompt response and remediation.

Register

People on the Move

Security awareness training firm KnowBe4 has named Bryan Palma as president and CEO effective May 5.

Threat intelligence firm Team Cymru has appointed Joe Sander as its Chief Executive Officer.

Madhu Gottumukkala has been named Deputy Director of the cybersecurity agency CISA.

More People On The Move

Expert Insights

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest cybersecurity news, threats, and expert insights. Unsubscribe at any time.