Artificial Intelligence

Red Teaming AI: The Build Vs Buy Debate

A strong AI deployment starts with asking the right questions, mapping your risks, and thinking like an adversary — before it’s too late.

Matt Honea

| June 16, 2025 (6:00 AM ET)

Before deploying an AI system, there are a few basic but critical questions that too often go unasked: Where is the model deployed? What kinds of inputs will it process? What will the output format be? What are the obvious business risks, and more importantly, how do we revisit business risks over time? If you’re not thinking about these things up front, then you are missing a significant portion of understanding how AI fits into your organization.

While many “out of the box” models have some form of protection trained into the model itself, these tend to be basic protections and are often focused on safety rather than security. “Model Cards” tend to offer some insights, however measurements are not standardized across the industry. In the absence of stronger security features in the models themselves, a wide range of products and tools have emerged to address the security of AI models and protect your most critical applications and data.

Before I delve deeper into the solutions, I want to address the terminology. The term “red teaming” is frequently used in AI and LLM circles, but not always with clarity or consistency. For some, it’s just another layer of internal QA or prompt testing, but that definition, in my view, is much too narrow. Red Teaming is a holistic cybersecurity assessment that includes probing technical and non-technical vulnerabilities within an organization. Red teaming is adversarial. For example, think of scenarios where you’re not just testing systems, but probing every human and technical weak point across the entire surface area. Approaches can include physical access, social engineering, and unexpected inputs in unexpected places. Here’s Microsoft’s definition below:

AI Red Teaming — (Image Credit: Microsoft)

In order to red team your AI model, you need to have a deep understanding of the system you are protecting. Today’s models are complex multimodal, multilingual systems. One model might take in text, images, code, and speech with any single input having the potential to break something. Attackers know this and can easily take advantage. For example, a QR code might contain an obfuscated prompt injection or a roleplay conversation might lead to ethical bypasses. This isn’t just about keywords, but about understanding how intent hides beneath layers of tokens, characters, and context. The attack surface isn’t just large, it’s effectively infinite. Here are a couple more novel examples of these types of attacks:

Dubbed “Stop and Roll” by Knostic, here is an attack where interrupting the prompt resulted in bypassing security guardrails within a large LLM.

The Stop and Roll Attack — (Image Credit: Knostic, Inc.)

This is similar to a side-channel attack, attacking the underlying architecture of models. Another example is the “Red Queen Attack,” by Hippocratic AI, a multi-turn role-play attack:

Some of the tactics are subtle but have big consequences because large language models use input tokens differently: uppercase versus lowercase, unicode characters versus non-unicode characters, high-signal words and phrases, complex prompt instruction sets and more. If you are curious to learn about these, there are thousands of jailbreaks widely available on the internet. Also adding fuel to the fire, many core system prompts are considered secret in theory, but have already leaked in practice. You can find some of them on GitHub which may lead to further jailbreaking.

Safeguards, Guardrails and Testing

When evaluating solutions, you should consider the needs and scale of your AI security solution, understanding that each layer introduces additional complexity, latency, and resource demands.

Advertisement. Scroll to continue reading.

Building versus buying is an age-old debate. Fortunately, the AI security space is maturing rapidly, and organizations have a lot of choices to implement from. After you have some time to evaluate your own criteria against Microsoft, OWASP and NIST frameworks, you should have a good idea of what your biggest risks are and key success criteria. After considering risk mitigation strategies, and assuming you want to keep AI turned on, there are some open-source deployment options like Promptfoo and Llama Guard, which provide useful scaffolding for evaluating model safety. Paid platforms like Lakera, Knostic, Robust Intelligence, Noma, and Aim are pushing the edge on real-time, content-aware security for AI, each offering slightly different tradeoffs in how they offer protection. Not only will all these tools evaluate inputs and outputs, but often they will go much deeper into understanding data context to make better-informed real-time decisions, and perform much better than base models.

One of the key insights I want to share is that regardless of your tooling of choice, you must be able to measure the inner workings of the system you put in place. LLMs are stochastic systems that are extremely difficult to replay and troubleshoot. Logging exact metrics such as temperature, top P, token length and others will immensely help debugging later on.

Ultimately, what really matters is mindset. Security isn’t just a feature, it’s a philosophy. Red teaming isn’t just a way to break things; it’s a way to understand what happens when things break. A secure AI deployment doesn’t mean “no risk.” It means you’ve mapped the landscape, you know what kind of behavior to expect (both good and bad), and you’ve built systems that evolve with that knowledge. That includes knowing your model, your data, your user interactions, and your guardrails. Red teaming gives you clarity. It forces you to think about the outcomes you want — and the ones you don’t. And it ensures your AI system can distinguish between them when it matters most.

There are plenty more areas to explore in model security, especially on the code side. Stay tuned as I go deeper into the compliance portion next time.

Learn More at The AI Risk Summit | Ritz-Carlton, Half Moon Bay

This column is Part 3 of multi-part series on securing generative AI:

Part 1: Back to the Future, Securing Generative AI
Part 2: Trolley problem, Safety Versus Security of Generative AI
Part 3: Build vs Buy, Red Teaming AI (This Column)
Part 4: Timeless Compliance (Stay Tuned)

Written By Matt Honea

Matt Honea is CISO at Hippocratic AI. He previously served as head of Security and Compliance at Forward Networks. He is a security leader and has a background in the areas of threat intelligence, networking, system forensics and discovery, enterprise security auditing, malware analysis and physical security. He is an industry speaker, author, and frequent security podcast guest. Matt also holds a US granted patent, multiple US Government awards and was selected as a one of Silicon Valley Business Journal 40 under 40.

More from Matt Honea

Webinar: How Modern Breaches Bypass MFA and Evade Detection

June 17, 2026

Today’s attackers are no longer breaking in — they’re logging in. Join this live webinar as we break down the modern identity attack chain and examine how recent breaches exploited weaknesses in authentication, identity verification, and access management processes.

Webinar: Modern Exposure Validation in the AI Era

June 24, 2026

AI has accelerated both sides of the fight. Adversaries are weaponizing vulnerabilities faster, while defenders are racing to ship detections and configurations. Join this live webinar as we explore how to prove your controls actually hold against new threats, map your security maturity, and unite breach simulation with automated pentesting into a single, coordinated program.

SECURITYWEEK NETWORK:

ICS:

SecurityWeek

Artificial Intelligence

Red Teaming AI: The Build Vs Buy Debate

More from Matt Honea

Trending

Webinar: How Modern Breaches Bypass MFA and Evade Detection

Webinar: Modern Exposure Validation in the AI Era

People on the Move

Expert Insights

No Exploits Required

After AI Reaches Production: 12 Ways Security Teams Can Take Control

Everybody Is Vibe Coding But Nobody Told the Security Team

The Zero-Knowledge Threat Actor and the End of Responsible Disclosure

Raising the Cybersecurity Stakes: Ante up for the Agentic Era

SECURITYWEEK NETWORK:

ICS:

Daily Briefing Newsletter

More from Matt Honea

Trending

Daily Briefing Newsletter

Webinar: How Modern Breaches Bypass MFA and Evade Detection

Webinar: Modern Exposure Validation in the AI Era

People on the Move

Expert Insights

No Exploits Required

After AI Reaches Production: 12 Ways Security Teams Can Take Control

Everybody Is Vibe Coding But Nobody Told the Security Team

The Zero-Knowledge Threat Actor and the End of Responsible Disclosure

Raising the Cybersecurity Stakes: Ante up for the Agentic Era

Daily Briefing Newsletter