Episode 23 — Evaluate AI in OT Security: ML, Generative AI, and Operational Risk Tradeoffs

In this episode, we’re going to make sense of how Artificial Intelligence (A I) is being discussed and applied in industrial environments, and why that conversation can feel confusing for brand-new learners. You may hear people say that A I will solve detection, automate response, predict failures, or even run parts of a plant more efficiently, and those claims can sound both exciting and a little scary. The truth is that some A I techniques can be genuinely useful in O T security, but the value depends on how the system is designed, what it is allowed to do, and how its mistakes are handled. In O T, the cost of a bad decision can be much higher than a bad pop-up on a laptop, because errors can affect safety, uptime, product quality, and even physical equipment. So instead of treating A I as magic or as menace, we’ll treat it as a toolset that creates tradeoffs. We’ll focus on what Machine Learning (M L) and generative A I are at a high level, where they can help in O T security, where they can hurt, and how to reason about operational risk without needing to be a data scientist.

Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.

To start, it helps to define the two main ideas we’ll use throughout: M L and generative A I. Machine Learning (M L) is a family of techniques where a system learns patterns from data and uses those patterns to make predictions or classifications, like noticing unusual behavior compared to a baseline. Generative A I is a family of techniques that can produce new content, like text summaries, proposed actions, or synthetic data, based on patterns learned from large collections of examples. In everyday language, M L is often used to answer questions like “is this normal or abnormal,” while generative A I is often used to answer questions like “what is a good explanation, draft, or response.” Both can be used in security, but they behave differently and fail differently. In O T, that difference matters because a detection system that misses something is one kind of failure, while a system that confidently suggests an unsafe action is another. A beginner-friendly approach is to always ask two questions: what decision is the A I helping with, and what happens if the A I is wrong?

One of the most common uses of M L in security is anomaly detection, which means noticing behavior that does not match what the system expects. In O T networks, behavior can be repetitive and structured, because machines often talk in predictable rhythms and with stable relationships. That predictability makes anomaly detection appealing, because unusual communication patterns, strange timing, or unexpected device interactions may stand out more clearly than they do on a messy corporate network. For example, if a device that normally only talks to a specific controller suddenly starts communicating with many peers, that change can be meaningful even before you know exactly why. But anomaly detection also has a famous weakness: it can produce many alerts that are technically different but not actually dangerous. In industrial environments, changes happen for legitimate reasons like maintenance, product changeovers, vendor troubleshooting, or seasonal demand shifts, and those operational changes can look “anomalous” to a model. If the model generates too many false alarms, operators may start ignoring it, which is a real operational risk because an ignored system is not a protective system.

That leads to a key tradeoff: sensitivity versus trust. If an M L model is tuned to be very sensitive, it will catch more unusual events, but it may also generate more noise, which can distract people and create alert fatigue. If it is tuned to be less sensitive, it may be quieter and easier to live with, but it may miss early warning signs of a real intrusion or misconfiguration. In O T, trust is particularly important because security systems are often tolerated only as long as they do not interfere with operations. A system that constantly interrupts, escalates, or demands attention can be seen as a threat to uptime even if its intentions are good. So a mature approach is to treat M L detection as one signal source among others, and to pair it with context like asset inventory, known maintenance windows, and process state. When you hear a claim like “A I detects threats automatically,” the beginner-safe interpretation is “A I can highlight patterns that deserve human review, but the environment still needs rules and context to decide what matters.”

Now let’s bring generative A I into the conversation, because its strengths are different. Generative A I is often good at summarizing information, translating between technical and non-technical language, and helping humans process large amounts of text quickly. In an O T security setting, that can be useful when you have logs, alerts, vendor advisories, and internal notes that need to be turned into a clear story. For example, it might help a responder turn a messy set of events into a simple explanation of what happened and what systems were involved. It might also help draft communications for leadership that focus on impact and response instead of raw technical detail. But generative A I has a major risk that beginners need to understand early: it can produce plausible-sounding content that is wrong. Even when it is not trying to mislead, it can fill gaps with confident guesses, and in O T, confident guesses can become unsafe decisions if people treat them as facts.

A practical way to evaluate generative A I in O T security is to separate assistance from authority. Assistance means the system helps a human work faster, such as summarizing, organizing, or suggesting options, while the human remains responsible for verifying and deciding. Authority means the system is allowed to act, such as changing network policy, isolating devices, pushing configurations, or triggering operational responses. In most industrial contexts, giving A I authority is far riskier than using it for assistance, because the cost of an unintended action can be high. Imagine a system that automatically blocks traffic it thinks is suspicious, but the traffic is actually a safety-related signal or a control message needed for stable operation. The system might “protect” security while disrupting the process, which is a failure in O T terms. That is why many safer designs keep A I in a recommendation role and require explicit approvals, especially when actions could affect control networks or production systems. Beginners should take away that A I can be helpful without being in charge, and that distinction is a core risk tradeoff.

Another common promise is predictive maintenance, where models analyze sensor data to predict when equipment might fail. This is not purely a security use case, but it touches security because it changes what data is collected, where it flows, and how decisions are made based on that data. Predictive systems can improve reliability by identifying problems early, but they can also create new attack incentives, because if an attacker can influence the data or the model, they may be able to cause unnecessary shutdowns, hide real degradation, or trigger expensive maintenance actions. Even without an attacker, models can be biased by the data they were trained on, meaning they may perform poorly on equipment types, operating conditions, or environments that differ from what they have seen before. In O T, where sites can vary widely, that transfer problem can be serious. Security teams evaluating these systems should consider not only whether the model is accurate in a lab, but whether it remains reliable under real plant conditions and under plausible disruptions like sensor drift, communication loss, or unusual process states.

This is a good moment to introduce the idea of model risk as its own category. Model risk includes problems like poor training data, incorrect assumptions, concept drift, and overconfidence, all of which can cause a model to behave badly over time. Concept drift is especially relevant in O T because processes can change, equipment can be replaced, and operating conditions can evolve, which means the “normal” pattern today may not be normal next year. A model that is not updated or recalibrated may start flagging normal operations as suspicious or, worse, treating abnormal operations as normal. Overconfidence is another issue, where a model outputs a high-confidence conclusion even when the situation is outside its experience. For a beginner, the important point is that A I systems are not static, even if they look like software products, because their performance depends on data and context. Evaluating A I in O T security means planning for ongoing validation, not just initial deployment.

Generative A I also raises a privacy and confidentiality tradeoff that matters in industrial settings. To be useful, a generative system often needs access to data like logs, incident notes, configurations, asset lists, or process descriptions. That information can be sensitive because it reveals how the environment works and where the weak points might be. If that data is sent to an external service or stored in a way that is not tightly controlled, it could create leakage risk, even if no one intended to share it. There is also the risk of accidental exposure through prompts and outputs, where someone copies sensitive information into a system and then reuses or shares the output in an unsafe way. For beginners, you can think of it like this: generative A I can act like a very capable assistant, but an assistant still needs rules about what they are allowed to see, what they are allowed to remember, and where their notes are stored. In O T security, data governance is not a paperwork topic, it is a practical safety topic, because information exposure can directly increase the chance and impact of attacks.

Another operational tradeoff is explainability, which means whether people can understand why the system made a suggestion or flagged an alert. In industrial environments, trust often depends on being able to connect a security signal to operational reality. If a system says “high risk” without showing what changed, operators may not accept it, especially when they are balancing many competing demands. Some M L systems can provide interpretable signals, like the specific features that changed or the specific communication patterns that were unusual, while others act like black boxes. Generative A I can provide explanations, but again, explanations can be persuasive even when they are wrong. A safe approach is to value evidence over narrative, meaning the system should point to observable events, logs, or changes that humans can verify. Beginners should learn to be cautious of smooth explanations that do not connect to concrete signals, because in security, confidence should come from corroboration, not from fluent language.

We also need to talk about adversarial pressure, which means attackers will adapt to A I systems if those systems become common. If defenders use anomaly detection, attackers may try to blend into normal patterns by moving slowly, using expected protocols, or timing their actions during maintenance. If defenders use generative A I for triage, attackers may attempt to poison inputs with misleading text, fake alerts, or crafted messages that steer the A I toward an incorrect conclusion. There is also the possibility of model poisoning, where training data is manipulated so that the model learns the wrong patterns. These are advanced topics, but the beginner-level takeaway is simple: A I does not end the attacker-defender game, it becomes part of it. Any system that influences decisions becomes a target for influence. So evaluating A I in O T security includes thinking about how an attacker could trick it, not just how it performs when everyone is honest.

A practical way to decide whether an A I capability is appropriate in O T is to look at what the system is allowed to affect and how reversible its actions are. If the system only generates a daily summary for human review, the downside of an error is usually manageable, because humans can correct it before acting. If the system automatically changes network access or isolates devices, the downside of an error could be immediate production disruption or safety risk. Reversibility matters because some actions are easy to undo, while others can create cascading effects that take time to stabilize. For example, isolating a device might stop a process, and restarting the process might require careful sequences, safety checks, and time, even if the isolation was a mistake. So when you evaluate A I in O T security, you are not only evaluating accuracy, you are evaluating the operational cost of mistakes. A beginner-friendly principle is to keep A I farther from direct control actions unless you have strong safeguards, clear approvals, and proven reliability under real conditions.

Let’s also consider the human side, because A I changes workflows, not just technology. If a security team relies heavily on A I summaries, they may become less familiar with raw logs and signals, which can weaken skills over time. If operators are asked to approve actions recommended by A I, they may feel blamed when they disagree, or they may defer too much if the system seems authoritative. This can create a subtle risk called automation bias, where people trust the system’s outputs more than they should, especially under stress. In O T, stress can be high during incidents because production pressure and safety concerns collide. A healthier workflow treats A I as a support tool that improves clarity and speed, while still requiring explicit checks and shared decision-making between operations and security. Beginners should remember that even the best A I system is part of a team, and the team needs clear roles and accountability.

As we close, the most useful way to evaluate A I in O T security is to keep returning to operational risk tradeoffs rather than getting lost in hype. M L can help find unusual behavior and highlight patterns that deserve attention, but it can also produce noise and requires careful tuning, context, and ongoing validation as the environment changes. Generative A I can help humans summarize, communicate, and reason through messy information faster, but it can also produce confident errors and introduce data exposure risks if it is fed sensitive information without safeguards. The safest and most common path in industrial settings is to use A I to assist humans, not to replace them, and to keep A I recommendations clearly separated from automated actions that could affect control systems. If you can explain how accuracy, trust, explainability, adversarial pressure, and the cost of mistakes all factor into whether an A I capability is appropriate, you will be evaluating A I the way O T security requires: not as a promise, but as a set of choices that must protect safety and reliable operations first.

Episode 23 — Evaluate AI in OT Security: ML, Generative AI, and Operational Risk Tradeoffs
Broadcast by