Artificial Intelligence

Claude AI APIs Can Be Abused for Data Exfiltration

An attacker can inject indirect prompts to trick the model into harvesting user data and sending it to the attacker’s account.

Claude

Attackers can use indirect prompt injections to trick Anthropic’s Claude into exfiltrating data the AI model’s users have access to, a security researcher has discovered.

The attack, Johann Rehberger of Embrace The Red explains, abuses Claude’s Files APIs, and is only possible if the AI model has network access (a feature enabled by default on certain plans and meant to allow Claude to access certain resources, such as code repositories and Anthropic APIs).

The attack is relatively straightforward: an indirect prompt injection payload can be used to read user data and store it in a file in Claude Code Interpreter’s sandbox, and then to trick the model into interacting with the Anthropic API using a key provided by the attacker.

The code in the payload requests Claude to upload the Code Interpreter file from the sandbox but, because the attacker’s API key is used, the file is uploaded to the attacker’s account.

“With this technique an adversary can exfiltrate up to 30MB at once according to the file API documentation, and of course we can upload multiple files,” Rehberger explains.

After the initial attempt was successful, Claude refused the payload, especially with the API key in plain text, and Rehberger had to mix benign code in the prompt injection, to convince Claude that it does not have malicious intent.

Advertisement. Scroll to continue reading.

The attack starts with the user loading a malicious document received from the attacker in Claude for analysis. The exploit code hijacks the model, which follows the malicious instructions to harvest the user’s data, save it to the sandbox, and then call the Anthropic File API to send it to the attacker’s account.

According to the researcher, the attack can be used to exfiltrate the user’s chat conversations, which are saved by Claude using the newly introduced ‘memories’ feature. The attacker can view and access the exfiltrated file in their console.

The researcher disclosed the attack to Anthropic via HackerOne on October 25, but the report was closed with the explanation that this was a model safety issue and not a security vulnerability.

However, after publishing information on the attack, Rehberger was notified by Anthropic that the data exfiltration vulnerability is in-scope for reporting.

Anthropic’s documentation underlines the risks associated with Claude having network access and of potential attacks carried out via external files or websites leading to code execution and information leaks. It also provides recommended mitigations against such attacks.

SecurityWeek has emailed Anthropic to inquire whether the company plans to devise a mitigation for such attacks.

Related: All Major Gen-AI Models Vulnerable to ‘Policy Puppetry’ Prompt Injection Attack

Related: Nvidia Triton Vulnerabilities Pose Big Risk to AI Models

Related: AI Sidebar Spoofing Puts ChatGPT Atlas, Perplexity Comet and Other Browsers at Risk

Related: Microsoft: Russia, China Increasingly Using AI to Escalate Cyberattacks on the US

Related Content

Artificial Intelligence

French President Emmanuel Macron urged the world’s wealthy democracies to work together on regulating advanced AI systems.

Artificial Intelligence

From defending networks to enabling attacks, artificial intelligence is changing every aspect of cybersecurity. Here's what dozens of experts say security leaders need to...

Artificial Intelligence

A group of cybersecurity executives and experts is asking the Trump administration to lift its directive preventing the use of Anthropic’s latest artificial intelligence...

Artificial Intelligence

Anthropic takes Fable 5 and Mythos 5 offline to comply with a directive from the Trump administration to prevent use by foreign nationals.

Artificial Intelligence

Industry professionals comment on various aspects of Fable 5, including dual-use capabilities, safeguards, and tiered access.

Artificial Intelligence

An AI hacker claims to have achieved a prompt-based jailbreak shortly after Fable 5’s launch, but Anthropic says it’s not a real jailbreak.

Incident Response

As alert volumes outpace human capacity, organizations are turning to AI, automation, and deeper context to separate real threats from the noise.

Application Security

Security teams need more than visibility into AI applications, they need a repeatable framework for monitoring, investigating, and defending them in production.

Copyright © 2026 SecurityWeek ®, a Wired Business Media Publication. All Rights Reserved.

Exit mobile version