Artificial Intelligence

Google Creates Red Team to Test Attacks Against AI Systems

Google has created a dedicated AI Red Team tasked with carrying out complex technical attacks on artificial intelligence systems.

Eduard Kovacs

July 21, 2023

Google has created a red team that focuses on artificial intelligence (AI) systems and it has published a report providing an overview of common types of attacks and the lessons learned.

The company announced its AI Red Team just weeks after introducing Secure AI Framework (SAIF), which is designed to provide a security framework for the development, use and protection of AI systems.

Google’s new report highlights the importance of red teaming for AI systems, the types of AI attacks that can be simulated by red teams, and lessons for other organizations that might consider launching their own team.

“The AI Red Team is closely aligned with traditional red teams, but also has the necessary AI subject matter expertise to carry out complex technical attacks on AI systems,” Google said.

The company’s AI Red Team takes the role of adversaries in testing the impact of potential attacks against real world products and features that use AI.

For instance, take prompt engineering, a widely used AI attack method where prompts are manipulated to force the system to respond in a specific manner desired by the attacker.

In an example shared by Google, a webmail application uses AI to automatically detect phishing emails and warn users. The security feature uses a general purpose large language model (LLM) — ChatGPT is the most well-known LLM — to analyze an email and classify it as legitimate or malicious.

An attacker who knows that the phishing detection feature uses AI can add to their malicious email an invisible paragraph (by setting its font to white) that contains instructions for the LLM, telling it to classify the email as legitimate.

Advertisement. Scroll to continue reading.

“If the web mail’s phishing filter is vulnerable to prompt attacks, the LLM might interpret parts of the email content as instructions, and classify the email as legitimate, as desired by the attacker. The phisher doesn’t need to worry about negative consequences of including this, since the text is well-hidden from the victim, and loses nothing even if the attack fails,” Google explained.

Another example involves the data used to train the LLM. While this training data has largely been stripped of personal and other sensitive information, researchers have shown that they were still able to extract personal information from an LLM.

Training data can also be abused in the case of email autocomplete features. An attacker could trick the AI into providing information about an individual using specially crafted sentences that the autocomplete feature completes with memorized training data that could include private information.

For instance, an attacker enters the text: “John Doe has been missing a lot of work lately. He has not been able to come to the office because…”. The autocomplete feature, based on training data, could complete the sentence with “he was interviewing for a new job”.

Locking down access to an LLM is also important. In an example provided by Google, a student gains access to an LLM specifically designed to grade essays. The model is able to prevent prompt injection, but access has not been locked down, allowing the student to train the model to always assign the best grade to papers that contain a specific word.

Google’s report has several other examples of types of attacks that an AI red team can put to the test.

As for lessons learned, Google recommends for traditional red teams to join forces with AI experts to create realistic adversarial simulations. It also points out that addressing the findings of red teams can be challenging and some issues may not be easy to fix.

Traditional security controls can be efficient in mitigating many risks. For example, ensuring that systems and models are properly locked down helps protect the integrity of AI models, preventing backdoors and data poisoning.

On the other hand, while some attacks on AI systems can be detected using traditional methods, others, such as content issues and prompt attacks, could require layering multiple security models.

Written By Eduard Kovacs

Eduard Kovacs (@EduardKovacs) is a managing editor at SecurityWeek. He worked as a high school IT teacher for two years before starting a career in journalism as Softpedia’s security news reporter. Eduard holds a bachelor’s degree in industrial informatics and a master’s degree in computer techniques applied in electrical engineering.

Latest News

Click to comment

CIEM Chat: How to Reduce Cloud Identity Risk

March 26, 2024

Join the session as we discuss the challenges and best practices for cybersecurity leaders managing cloud identities.

Virtual Event: Ransomware Resilience & Recovery Summit

April 17, 2024

SecurityWeek’s Ransomware Resilience and Recovery Summit helps businesses to plan, prepare, and recover from a ransomware incident.

Navigating Vendor Speak: A Security Practitioner’s Guide to Seeing Through the Jargon

As a security industry, we need to focus our energies on those professionals among us who know how to walk the walk. (Joshua Goldfarb)

SD-WAN: Don’t Build a Dead End, Prepare for Future-Proof Secure Networking

SD-WAN must be scalable, stable, secure, and fully operational to serve as a strong base for seamless modernization and progression to SASE. (Etay Maor)

You Against the World: The Offenders Dilemma

Foreign attackers have many more toolsets at their disposal, so we need to make sure we’re selective about our modeling, preparation and how we assess and fortify ourselves. (Tom Eston)

Why Intelligence Sharing Is Vital to Building a Robust Collective Cyber Defense Program

With automated, detailed, contextualized threat intelligence, organizations can better anticipate malicious activity and utilize intelligence to speed detection around proven attacks. (Marc Solomon)

Know Your Audience When Speaking to Security Practitioners

How can security practitioners make sense of the vendor landscape and separate those who talk a good game from those who can execute, perform, and solve real problems for enterprises? (Joshua Goldfarb)

Artificial Intelligence

AI Helps Crack NIST-Recommended Post-Quantum Encryption Algorithm

The CRYSTALS-Kyber public-key encryption and key encapsulation mechanism recommended by NIST for post-quantum cryptography has been broken using AI combined with side channel attacks.

Kevin TownsendFebruary 21, 2023