Connect with us

Hi, what are you looking for?

SecurityWeekSecurityWeek

Artificial Intelligence

Red Teaming AI: The Build Vs Buy Debate

A strong AI deployment starts with asking the right questions, mapping your risks, and thinking like an adversary — before it’s too late.

AI Jailbreak

Before deploying an AI system, there are a few basic but critical questions that too often go unasked: Where is the model deployed? What kinds of inputs will it process? What will the output format be? What are the obvious business risks, and more importantly, how do we revisit business risks over time? If you’re not thinking about these things up front, then you are missing a significant portion of understanding how AI fits into your organization.

While many “out of the box” models have some form of protection trained into the model itself, these tend to be basic protections and are often focused on safety rather than security. “Model Cards” tend to offer some insights, however measurements are not standardized across the industry. In the absence of stronger security features in the models themselves, a wide range of products and tools have emerged to address the security of AI models and protect your most critical applications and data.

Before I delve deeper into the solutions, I want to address the terminology. The term “red teaming” is frequently used in AI and LLM circles, but not always with clarity or consistency. For some, it’s just another layer of internal QA or prompt testing, but that definition, in my view, is much too narrow. Red Teaming is a holistic cybersecurity assessment that includes probing technical and non-technical vulnerabilities within an organization. Red teaming is adversarial. For example, think of scenarios where you’re not just testing systems, but probing every human and technical weak point across the entire surface area. Approaches can include physical access, social engineering, and unexpected inputs in unexpected places. Here’s Microsoft’s definition below:

AI Red Teaming
(Image Credit: Microsoft)

In order to red team your AI model, you need to have a deep understanding of the system you are protecting. Today’s models are complex multimodal, multilingual systems. One model might take in text, images, code, and speech with any single input having the potential to break something. Attackers know this and can easily take advantage. For example, a QR code might contain an obfuscated prompt injection or a roleplay conversation might lead to ethical bypasses. This isn’t just about keywords, but about understanding how intent hides beneath layers of tokens, characters, and context. The attack surface isn’t just large, it’s effectively infinite. Here are a couple more novel examples of these types of attacks:

Dubbed “Stop and Roll” by Knostic, here is an attack where interrupting the prompt resulted in bypassing security guardrails within a large LLM.

The Stop and Roll Attack
(Image Credit: Knostic, Inc.)

This is similar to a side-channel attack, attacking the underlying architecture of models. Another example is the “Red Queen Attack,” by Hippocratic AI, a multi-turn role-play attack:

Red Queen Attack
RED QUEEN ATTACK, the first work constructing multi-turn scenarios to conceal attackers’ harmful intent, reaching promising results against current LLMs.

Some of the tactics are subtle but have big consequences because large language models use input tokens differently: uppercase versus lowercase, unicode characters versus non-unicode characters, high-signal words and phrases, complex prompt instruction sets and more. If you are curious to learn about these, there are thousands of jailbreaks widely available on the internet. Also adding fuel to the fire, many core system prompts are considered secret in theory, but have already leaked in practice. You can find some of them on GitHub which may lead to further jailbreaking.

Safeguards, Guardrails and Testing

When evaluating solutions, you should consider the needs and scale of your AI security solution, understanding that each layer introduces additional complexity, latency, and resource demands.

Building versus buying is an age-old debate. Fortunately, the AI security space is maturing rapidly, and organizations have a lot of choices to implement from. After you have some time to evaluate your own criteria against Microsoft, OWASP and NIST frameworks, you should have a good idea of what your biggest risks are and key success criteria. After considering risk mitigation strategies, and assuming you want to keep AI turned on, there are some open-source deployment options like Promptfoo and Llama Guard, which provide useful scaffolding for evaluating model safety. Paid platforms like Lakera, Knostic, Robust Intelligence, Noma, and Aim are pushing the edge on real-time, content-aware security for AI, each offering slightly different tradeoffs in how they offer protection. Not only will all these tools evaluate inputs and outputs, but often they will go much deeper into understanding data context to make better-informed real-time decisions, and perform much better than base models.

One of the key insights I want to share is that regardless of your tooling of choice, you must be able to measure the inner workings of the system you put in place. LLMs are stochastic systems that are extremely difficult to replay and troubleshoot. Logging exact metrics such as temperature, top P, token length and others will immensely help debugging later on.

Advertisement. Scroll to continue reading.

Ultimately, what really matters is mindset. Security isn’t just a feature, it’s a philosophy. Red teaming isn’t just a way to break things; it’s a way to understand what happens when things break. A secure AI deployment doesn’t mean “no risk.” It means you’ve mapped the landscape, you know what kind of behavior to expect (both good and bad), and you’ve built systems that evolve with that knowledge. That includes knowing your model, your data, your user interactions, and your guardrails. Red teaming gives you clarity. It forces you to think about the outcomes you want — and the ones you don’t. And it ensures your AI system can distinguish between them when it matters most.

There are plenty more areas to explore in model security, especially on the code side. Stay tuned as I go deeper into the compliance portion next time.

Learn More at The AI Risk Summit | Ritz-Carlton, Half Moon Bay

This column is Part 3 of multi-part series on securing generative AI:

Part 1: Back to the Future, Securing Generative AI
Part 2: Trolley problem, Safety Versus Security of Generative AI
Part 3: Build vs Buy, Red Teaming AI (This Column)
Part 4: Timeless Compliance (Stay Tuned)

Written By

Matt Honea is CISO at Hippocratic AI. He previously served as head of Security and Compliance at Forward Networks. He is a security leader and has a background in the areas of threat intelligence, networking, system forensics and discovery, enterprise security auditing, malware analysis and physical security. He is an industry speaker, author, and frequent security podcast guest. Matt also holds a US granted patent, multiple US Government awards and was selected as a one of Silicon Valley Business Journal 40 under 40.

Trending

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest threats, trends, and technology, along with insightful columns from industry experts.

Learn how the LOtL threat landscape has evolved, why traditional endpoint hardening methods fall short, and how adaptive, user-aware approaches can reduce risk.

Watch Now

Join the summit to explore critical threats to public cloud infrastructure, APIs, and identity systems through discussions, case studies, and insights into emerging technologies like AI and LLMs.

Register

People on the Move

Kenna Security co-founder Ed Bellis has joined Empirical Security as Chief Executive Officer.

Robert Shaker II has joined application security firm ActiveState as Chief Product and Technology Officer.

MorganFranklin Cyber has promoted Nick Stallone and Ferdinand Hamada into newly created roles.

More People On The Move

Expert Insights

Daily Briefing Newsletter

Subscribe to the SecurityWeek Email Briefing to stay informed on the latest cybersecurity news, threats, and expert insights. Unsubscribe at any time.