Episode 33 — Benchmark OT Security Progress: Baselines, Targets, and Evidence That Holds Up

In this episode, we’re going to focus on how an O T security program proves it is improving in a way that is meaningful, credible, and useful for decision-making. Beginners often assume progress is obvious, like you install a tool or patch a system and then you are “more secure.” In real operational environments, progress is harder to judge because change is slow, systems are long-lived, and new risks appear as networks evolve and vendors add connectivity. That is why benchmarking matters. Benchmarking is the discipline of establishing a baseline for where you are today, setting clear targets for where you need to be, and collecting evidence that demonstrates real, repeatable improvement over time. In O T, this has to be done without turning the program into a paperwork factory and without disrupting production just to chase metrics. It also has to be done in a way that stands up under scrutiny, because leaders, auditors, regulators, and incident investigators may all ask for proof that controls exist and are operating as intended. By the end, you should be able to explain what baselines and targets mean in O T security, how evidence differs from opinions, and how to avoid the common traps that make benchmarking misleading or fragile.

Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.

A baseline is simply an honest snapshot of current reality, captured in a way that can be compared to the future. In O T security, baselines often include what assets exist, how they are connected, who can access them, what protections are in place, and how consistently those protections are used. A baseline is not a marketing statement and it is not a grade; it is a starting point. Beginners sometimes assume baselines are discouraging because they expose weaknesses, but the opposite is usually true. A baseline reduces confusion and helps teams stop arguing about perceptions. Instead of debating whether segmentation is “good,” a baseline can show what segments exist, what traffic flows are allowed, and where exceptions are concentrated. Instead of debating whether patching is “handled,” a baseline can show what percentage of critical systems are within an accepted update window and what systems are known to be unpatchable. The most important beginner lesson is that a baseline is only useful if it is specific and measurable enough to repeat later, because the whole point is to track change. If you cannot reproduce the baseline, you cannot reliably claim improvement.

Targets are the future-facing companion to baselines, and they answer the question of what progress looks like in practical terms. A target is not just “be more secure,” because that is too vague to guide action. In O T, targets should reflect risk levels, operational constraints, and safety priorities, meaning high-risk areas should have stronger targets than low-risk areas. Targets should also be time-aware, because some improvements can happen quickly, like tightening access for a specific system, while others require long planning cycles, like redesigning network segmentation or upgrading legacy equipment. A good target is actionable, such as improving visibility to include a larger portion of critical assets, reducing the number of shared accounts, or ensuring incident response procedures are practiced and documented. Targets also need to be realistic so they do not push teams into unsafe changes or meaningless checkbox work. Beginners should understand that targets are promises the program makes to itself and to leadership, and those promises must be grounded in what operations can actually support. When targets are well chosen, they help teams prioritize, budget, and schedule improvements without constant conflict.

Evidence that holds up is the third element, and it is what makes benchmarking credible rather than aspirational. Evidence is information that can be used to verify that a control exists, is used, and works as intended, and it should be resilient to skepticism. In O T, evidence often includes change records showing approvals and testing, access reviews showing who has what permissions, logs showing authentication and administrative actions, network configurations that demonstrate segmentation, and incident response records that show how events were handled. Evidence also includes proof that controls are maintained over time, not just installed once. For example, it is easy to configure an access control rule on one day; it is harder to show that rules are reviewed, exceptions are tracked, and changes are controlled for months. Beginners should learn that evidence is not the same as documentation. Documentation can be claimed, while evidence is tied to actual behavior and operational artifacts. Strong evidence is also consistent across sources, meaning if a policy says something, the logs and records should reflect it. When evidence is consistent, it builds trust; when evidence is inconsistent, it raises questions even if the intent is good.

A big challenge in O T benchmarking is that measuring the wrong things can create the illusion of progress while leaving real risk unchanged. For example, counting the number of security tools deployed can look impressive but may not reflect whether the tools are configured correctly, monitored, or useful. Counting the number of alerts can also be misleading because more alerts might mean a noisy system, not a safer one. Even counting patch rates can be tricky in O T, because some systems cannot be patched quickly due to operational constraints, and a program might wisely choose compensating controls instead. This is why good benchmarking focuses on outcomes and capability, not just activity. Outcomes in O T include reduced exposure of critical systems, improved ability to detect abnormal behavior, and faster, safer recovery. Capability means the organization can repeat key practices reliably, like controlling access, managing changes, and tracking exceptions. Beginners should remember that a benchmark should drive better decisions, and if a metric causes people to chase numbers rather than reduce risk, that metric is not serving the program.

Another subtle issue is that baselines and targets must account for change in the environment itself. O T environments are not static, even if some equipment is old. New sensors are added, networks are extended, vendors introduce remote support pathways, and data flows expand to cloud and edge services. If your benchmark does not consider that growth, you may think you are improving while the actual risk surface is growing faster than your controls. A helpful beginner mindset is to compare progress relative to complexity, not only in absolute terms. For example, it might be progress to maintain the same level of security coverage even while the number of connected assets increases, because it means the program is keeping up rather than falling behind. Similarly, reducing the number of unmanaged connections while the environment expands can indicate real maturity. This is where the registry becomes crucial, because you need an accurate picture of the environment to interpret your metrics. Beginners should see that benchmarking is not about freezing the world; it is about tracking improvement in a world that keeps moving.

Evidence also has to be operationally credible, meaning it should reflect how the environment actually behaves, not how people wish it behaved. In O T, this is especially important because documentation can drift away from reality over time as systems evolve. A network diagram might look clean, but the real traffic flows might include emergency connections added during incidents. An access policy might say that shared accounts are banned, but operations might still use them because the vendor system requires it. A mature benchmarking approach does not hide these realities; it captures them and manages them. That might mean recording exceptions, documenting compensating controls, and tracking progress toward eliminating the exception when feasible. The program can still show improvement even when imperfections exist, as long as the imperfections are known, owned, and being addressed. Beginners should learn that evidence that holds up is evidence that survives contact with reality. If evidence collapses when you ask how the system is actually used, it is not strong enough to guide decisions.

Targets also need to be layered, because O T security improvements often depend on foundations. For example, you cannot credibly target advanced anomaly detection coverage if you do not have a stable asset inventory and basic network visibility. You cannot target strict change control enforcement if you do not have a realistic way for operations to request and approve changes during maintenance windows. You cannot target rapid recovery if you do not have trusted backups and a plan for validating control integrity before returning to service. This layered approach prevents a program from jumping to advanced goals without the groundwork needed to sustain them. It also prevents frustration because teams can see why a target exists and how it connects to other work. Beginners should understand that a good benchmark roadmap builds capability in a sequence that matches operational reality, and the evidence you collect should show that sequence. When the sequence is respected, progress feels steady rather than chaotic, and that steadiness is what builds long-term trust.

A key part of making evidence hold up is consistency in how it is collected and reviewed. If one site measures asset coverage differently from another, comparisons become meaningless and leadership may lose confidence. If one team logs changes carefully while another makes changes informally, the program will have weak spots that are difficult to defend. Benchmarking should therefore include clear definitions for what counts as compliance with a practice, what counts as an exception, and what records are required. It should also include regular reviews where evidence is sampled and verified, not just stored. In O T, sampling can be a practical approach because reviewing everything constantly may not be realistic. Sampling still provides confidence if it is done consistently and if findings lead to improvement. Beginners should learn that evidence is not created once and then forgotten; it is maintained through routine checks and by responding to gaps when they appear. This is how evidence becomes durable rather than performative.

There is also a human element in benchmarking that beginners should recognize, because metrics influence behavior. If you set targets that reward speed over safety, teams may rush changes and create incidents. If you set targets that reward perfect scores, teams may hide problems to avoid looking bad. If you set targets that reward activity counts, teams may generate busywork. The healthiest targets encourage transparency, steady improvement, and risk reduction. For example, a target that reduces the number of unknown assets encourages discovery and honesty. A target that requires documenting and reviewing exceptions encourages disciplined risk acceptance. A target that tracks recovery readiness encourages testing and coordination, not just tool deployment. This is why benchmarking should be designed with governance input from both operations and security, so that the measures support stable operations and credible security. Beginners should take away that a benchmark system is part of culture. It can build trust when it rewards truth and learning, or it can destroy trust when it rewards appearances.

As we wrap up, benchmarking O T security progress means establishing clear baselines, setting realistic targets tied to risk and operational constraints, and collecting evidence that demonstrates real, repeatable improvement. Baselines provide an honest snapshot of current reality that can be measured again later. Targets define what progress looks like in actionable terms, layered in a sequence that respects operational foundations. Evidence that holds up comes from operational artifacts and consistent practices, not from aspirational statements, and it must reflect how the environment actually works, including known exceptions and compensating controls. Good benchmarking avoids misleading metrics that create noise or busywork and instead focuses on capability and outcomes like reduced exposure, better visibility, stronger change control, and safer recovery. When you can explain how baselines, targets, and durable evidence work together to guide decisions and build trust with leadership and regulators, you will understand how O T security becomes a program that improves steadily rather than a collection of disconnected projects.

Episode 33 — Benchmark OT Security Progress: Baselines, Targets, and Evidence That Holds Up
Broadcast by