Artificial Intelligence

Google Creates Red Team to Test Attacks Against AI Systems

Google has created a dedicated AI Red Team tasked with carrying out complex technical attacks on artificial intelligence systems.

Google has created a dedicated AI Red Team tasked with carrying out complex technical attacks on artificial intelligence systems.

Google has created a red team that focuses on artificial intelligence (AI) systems and it has published a report providing an overview of common types of attacks and the lessons learned.

The company announced its AI Red Team just weeks after introducing Secure AI Framework (SAIF), which is designed to provide a security framework for the development, use and protection of AI systems.

Google’s new report highlights the importance of red teaming for AI systems, the types of AI attacks that can be simulated by red teams, and lessons for other organizations that might consider launching their own team.

“The AI Red Team is closely aligned with traditional red teams, but also has the necessary AI subject matter expertise to carry out complex technical attacks on AI systems,” Google said.

The company’s AI Red Team takes the role of adversaries in testing the impact of potential attacks against real world products and features that use AI. 

For instance, take prompt engineering, a widely used AI attack method where prompts are manipulated to force the system to respond in a specific manner desired by the attacker. 

In an example shared by Google, a webmail application uses AI to automatically detect phishing emails and warn users. The security feature uses a general purpose large language model (LLM) — ChatGPT is the most well-known LLM — to analyze an email and classify it as legitimate or malicious.

An attacker who knows that the phishing detection feature uses AI can add to their malicious email an invisible paragraph (by setting its font to white) that contains instructions for the LLM, telling it to classify the email as legitimate. 

Advertisement. Scroll to continue reading.

“If the web mail’s phishing filter is vulnerable to prompt attacks, the LLM might interpret parts of the email content as instructions, and classify the email as legitimate, as desired by the attacker. The phisher doesn’t need to worry about negative consequences of including this, since the text is well-hidden from the victim, and loses nothing even if the attack fails,” Google explained. 

Another example involves the data used to train the LLM. While this training data has largely been stripped of personal and other sensitive information, researchers have shown that they were still able to extract personal information from an LLM. 

Training data can also be abused in the case of email autocomplete features. An attacker could trick the AI into providing information about an individual using specially crafted sentences that the autocomplete feature completes with memorized training data that could include private information. 

For instance, an attacker enters the text: “John Doe has been missing a lot of work lately. He has not been able to come to the office because…”. The autocomplete feature, based on training data, could complete the sentence with “he was interviewing for a new job”.

Locking down access to an LLM is also important. In an example provided by Google, a student gains access to an LLM specifically designed to grade essays. The model is able to prevent prompt injection, but access has not been locked down, allowing the student to train the model to always assign the best grade to papers that contain a specific word. 

Google’s report has several other examples of types of attacks that an AI red team can put to the test.

As for lessons learned, Google recommends for traditional red teams to join forces with AI experts to create realistic adversarial simulations. It also points out that addressing the findings of red teams can be challenging and some issues may not be easy to fix.

Traditional security controls can be efficient in mitigating many risks. For example, ensuring that systems and models are properly locked down helps protect the integrity of AI models, preventing backdoors and data poisoning. 

On the other hand, while some attacks on AI systems can be detected using traditional methods, others, such as content issues and prompt attacks, could require layering multiple security models.  

Related: Now’s the Time for a Pragmatic Approach to New Technology Adoption

Related: ChatGPT Hallucinations Can Be Exploited to Distribute Malicious Code Packages

Related: AntChain, Intel Create New Privacy-Preserving Computing Platform for AI Training

Related Content

Artificial Intelligence

China’s official Xinhua news agency said the two sides would take up issues including the technological risks of AI and global governance.

Artificial Intelligence

When not scamming other criminals, criminals are concentrating on the use of mainstream AI products rather than developing their own AI systems.

Artificial Intelligence

Israeli AI security firm Apex has received $7 million in seed funding for its detection, investigation, and response platform.

Artificial Intelligence

Japan's Prime Minister unveiled an international framework for regulation and use of generative AI, adding to global efforts on governance for the rapidly advancing...

Artificial Intelligence

AI-Native Trust, Risk, and Security Management (TRiSM) startup DeepKeep raises $10 million in seed funding.

Artificial Intelligence

Microsoft provides an easy and logical first step into GenAI for many organizations, but beware of the pitfalls.

Artificial Intelligence

CEOs of major tech companies are joining a new artificial intelligence safety board to advise the federal government on how to protect the nation’s...

Artificial Intelligence

New CISA guidelines categorize AI risks into three significant types and pushes a four-part mitigation strategy.

Copyright © 2024 SecurityWeek ®, a Wired Business Media Publication. All Rights Reserved.

Exit mobile version