Connect with us

Hi, what are you looking for?


Artificial Intelligence

Google Creates Red Team to Test Attacks Against AI Systems

Google has created a dedicated AI Red Team tasked with carrying out complex technical attacks on artificial intelligence systems.

Google has created a red team that focuses on artificial intelligence (AI) systems and it has published a report providing an overview of common types of attacks and the lessons learned.

The company announced its AI Red Team just weeks after introducing Secure AI Framework (SAIF), which is designed to provide a security framework for the development, use and protection of AI systems.

Google’s new report highlights the importance of red teaming for AI systems, the types of AI attacks that can be simulated by red teams, and lessons for other organizations that might consider launching their own team.

“The AI Red Team is closely aligned with traditional red teams, but also has the necessary AI subject matter expertise to carry out complex technical attacks on AI systems,” Google said.

The company’s AI Red Team takes the role of adversaries in testing the impact of potential attacks against real world products and features that use AI. 

For instance, take prompt engineering, a widely used AI attack method where prompts are manipulated to force the system to respond in a specific manner desired by the attacker. 

In an example shared by Google, a webmail application uses AI to automatically detect phishing emails and warn users. The security feature uses a general purpose large language model (LLM) — ChatGPT is the most well-known LLM — to analyze an email and classify it as legitimate or malicious.

Advertisement. Scroll to continue reading.

An attacker who knows that the phishing detection feature uses AI can add to their malicious email an invisible paragraph (by setting its font to white) that contains instructions for the LLM, telling it to classify the email as legitimate. 

“If the web mail’s phishing filter is vulnerable to prompt attacks, the LLM might interpret parts of the email content as instructions, and classify the email as legitimate, as desired by the attacker. The phisher doesn’t need to worry about negative consequences of including this, since the text is well-hidden from the victim, and loses nothing even if the attack fails,” Google explained. 

Another example involves the data used to train the LLM. While this training data has largely been stripped of personal and other sensitive information, researchers have shown that they were still able to extract personal information from an LLM. 

Training data can also be abused in the case of email autocomplete features. An attacker could trick the AI into providing information about an individual using specially crafted sentences that the autocomplete feature completes with memorized training data that could include private information. 

For instance, an attacker enters the text: “John Doe has been missing a lot of work lately. He has not been able to come to the office because…”. The autocomplete feature, based on training data, could complete the sentence with “he was interviewing for a new job”.

Locking down access to an LLM is also important. In an example provided by Google, a student gains access to an LLM specifically designed to grade essays. The model is able to prevent prompt injection, but access has not been locked down, allowing the student to train the model to always assign the best grade to papers that contain a specific word. 

Google’s report has several other examples of types of attacks that an AI red team can put to the test.

As for lessons learned, Google recommends for traditional red teams to join forces with AI experts to create realistic adversarial simulations. It also points out that addressing the findings of red teams can be challenging and some issues may not be easy to fix.

Traditional security controls can be efficient in mitigating many risks. For example, ensuring that systems and models are properly locked down helps protect the integrity of AI models, preventing backdoors and data poisoning. 

On the other hand, while some attacks on AI systems can be detected using traditional methods, others, such as content issues and prompt attacks, could require layering multiple security models.  

Related: Now’s the Time for a Pragmatic Approach to New Technology Adoption

Related: ChatGPT Hallucinations Can Be Exploited to Distribute Malicious Code Packages

Related: AntChain, Intel Create New Privacy-Preserving Computing Platform for AI Training

Written By

Eduard Kovacs (@EduardKovacs) is a managing editor at SecurityWeek. He worked as a high school IT teacher for two years before starting a career in journalism as Softpedia’s security news reporter. Eduard holds a bachelor’s degree in industrial informatics and a master’s degree in computer techniques applied in electrical engineering.

Click to comment

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

Join security experts as they discuss ZTNA’s untapped potential to both reduce cyber risk and empower the business.


Join Microsoft and Finite State for a webinar that will introduce a new strategy for securing the software supply chain.


Expert Insights

Related Content

Artificial Intelligence

The CRYSTALS-Kyber public-key encryption and key encapsulation mechanism recommended by NIST for post-quantum cryptography has been broken using AI combined with side channel attacks.

Artificial Intelligence

The release of OpenAI’s ChatGPT in late 2022 has demonstrated the potential of AI for both good and bad.

Artificial Intelligence

ChatGPT is increasingly integrated into cybersecurity products and services as the industry is testing its capabilities and limitations.

Artificial Intelligence

The degree of danger that may be introduced when adversaries start to use AI as an effective weapon of attack rather than a tool...

Artificial Intelligence

Two of humanity’s greatest drivers, greed and curiosity, will push AI development forward. Our only hope is that we can control it.

Artificial Intelligence

Microsoft and Mitre release Arsenal plugin to help cybersecurity professionals emulate attacks on machine learning (ML) systems.

Artificial Intelligence

Two new surveys stress the need for automation and AI – but one survey raises the additional specter of the growing use of bring...

Application Security

Thinking through the good, the bad, and the ugly now is a process that affords us “the negative focus to survive, but a positive...