Episode 42 — Determine Asset Criticality: What Fails First, What Hurts Most, and Why
In this episode, we’re going to take a problem that sounds like a spreadsheet exercise and treat it like what it really is in OT: a way of deciding what you must protect first because you cannot protect everything equally. When people hear the word asset, they often picture a single device, like a controller or a workstation, but in operational technology an asset can be a piece of equipment, a system, a network segment, a software service, or even a dependency like a time source or an engineering workstation that many systems rely on. Criticality is the idea of importance under pressure, meaning which assets matter most when something breaks, gets misused, or becomes unavailable. If you are new to cybersecurity, you might assume that the most critical assets are simply the most expensive ones or the ones with the newest technology, but OT criticality is tied to consequences in the real world. Some failures threaten safety, some threaten the environment, some stop production, and some quietly degrade quality until a huge cost shows up later. The point of determining asset criticality is to identify what fails first, what hurts most, and why, so your security choices are guided by consequence rather than guesswork.
Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.
A strong starting point is to separate the idea of importance from the idea of exposure, because beginners often mix them. An asset can be extremely important while being relatively well isolated, and another asset can be less important but very exposed to outside connections, and you need to understand both dimensions to manage risk. Criticality is primarily about impact if the asset is compromised, misconfigured, or unavailable, not about how likely it is to be attacked. Later, you combine criticality with likelihood factors to decide priorities, but first you need a clean view of impact. In OT, impact is multi-layered, because a single system might affect safety, reliability, production throughput, regulatory compliance, and public trust all at once. That is why criticality is not just a technical label, it is a shared understanding between operations, engineering, safety, and security about what the organization cannot afford to lose. When teams disagree about what is critical, it often means they are using different impact lenses, not that someone is wrong. Building criticality is partly a communication exercise that forces clarity.
To understand what fails first, it helps to think in terms of dependencies rather than isolated devices. Many OT assets are only important because of what they support, and a seemingly small component can become critical if it is a single point of failure. For example, an engineering workstation might not seem as important as a controller, but if it holds the only known-good configurations or the only way to safely program the system, losing it could stall recovery or force risky improvisation. Similarly, a network switch might look generic, but if it connects a critical zone and has no redundancy, its failure can disconnect control signals and stop operations. Criticality thinking therefore asks, what depends on this, and what happens if it is wrong, not just if it is down. In OT, wrong can mean a sensor reading that drifts, a setpoint that changes, or a logic routine that behaves unexpectedly, which can be more dangerous than a clear outage. Beginners often focus on downtime, but incorrect operation can be the more subtle and severe problem. Determining criticality requires mapping how control, visibility, and safety interlock.
The phrase what hurts most points you toward consequence categories that OT organizations care about. Safety is often at the top, because harm to people is unacceptable and because safety incidents can have long-term moral and legal consequences. Environmental impact is another category, because uncontrolled releases or unsafe conditions can cause lasting harm and major penalties. Production impact matters because downtime, reduced throughput, and damaged equipment can be extremely costly, and in some industries even short interruptions can create cascading supply problems. Quality impact matters because a system can keep running while producing defective output, which can trigger recalls, wasted materials, or reputational damage. Regulatory impact also matters because some systems are tied to compliance obligations, reporting requirements, and audits that can have financial and legal consequences. The key idea for beginners is that criticality is not one-dimensional, and an asset can be critical for different reasons depending on what it controls and what the process produces. When you label an asset critical, you should be able to say critical because of which consequence and under what failure condition.
Because OT is built on processes, you also want to understand criticality in terms of process steps and bottlenecks. A production line may have many machines, but only a few steps determine the overall output, and those steps can become the critical path. If a particular controller governs a bottleneck step, its failure can stop everything upstream and downstream even if other assets remain operational. In other cases, the process might be resilient to the loss of one system because there is redundancy or alternate routing, making that asset less critical than it appears. This is why criticality cannot be determined from network diagrams alone, because the business and physical process context matters. Operators and engineers often know which parts of the process are brittle and which parts can be worked around, so criticality assessment should involve them directly. For a beginner, it helps to think of this like a transportation system, where a single bridge can be more critical than many roads because there is no alternate route. OT assets frequently have that bridge-like role, and you need to find them deliberately.
Another important distinction is between assets that are critical to operate and assets that are critical to recover. Some assets might not be required for normal operation, but they become essential during an incident, a maintenance event, or a restart after an outage. Backups, configuration repositories, spare parts, and documentation systems often fall into this category, and they can be overlooked because they do not show up in day-to-day production metrics. In cybersecurity, recovery capability is a major part of resilience, and in OT it is tied to safety because restoring systems incorrectly can be dangerous. If you lose the ability to restore a controller to a known-good state, you might face a longer outage or a riskier restart, even if the controller itself is physically intact. That means recovery-related assets can be critical even if they do not directly touch the process. Determining criticality should therefore include asking, if something goes wrong, what do we need to bring systems back safely and confidently. Beginners sometimes assume recovery is automatic, but in OT it often depends on specialized knowledge and carefully maintained assets.
Criticality is also influenced by the uniqueness of the asset and the availability of alternatives. If an asset has a direct substitute, like a redundant controller or an alternate network path, its criticality may be lower, assuming the redundancy is real and regularly tested. If the asset is unique, custom, hard to replace, or supported by a vendor with long lead times, its criticality increases because failure has longer-lasting consequences. This is not just about cost, but about time to restore and the operational constraints during that time. A rare component with a twelve-week lead time can be more critical than a more expensive component that can be replaced overnight. It also matters whether the asset relies on specialized expertise, such as a configuration that only one person understands, because that creates a human single point of failure. When you determine criticality, you are really asking how fragile the organization becomes if this asset is lost or corrupted. Fragility is a practical way to think about criticality because it combines consequence and recoverability in a way that operations can relate to.
Beginners also need to understand that criticality is not always tied to the most obvious control devices. Visibility assets, like sensors, historians, and monitoring systems, can be critical because they allow safe operation and safe troubleshooting. If you cannot see process state, operators may be forced to run blind, increasing the chance of unsafe conditions or poor quality. Time synchronization and sequence-of-events systems can be critical in environments where timing accuracy matters for protection systems or incident analysis. Even a domain service that supports authentication can become critical if its failure prevents legitimate access during a crisis, especially if workarounds create insecurity. The lesson is that supporting services can be critical if their loss creates confusion, delays, or unsafe improvisation. OT is a system of systems, and criticality follows system behavior, not job titles. That is why a narrow focus on controllers alone can miss hidden dependencies that truly drive operational risk.
Once you have a sense of impact and dependency, you need a consistent way to assign criticality levels so people can compare assets fairly. The goal is not mathematical perfection, but a shared method that avoids arbitrary labels. A common approach is to assign tiers, like high, medium, and low, based on consequence thresholds, but the tiers should be defined in terms of real outcomes, such as potential for injury, maximum tolerable downtime, or financial impact ranges. Another approach is to score assets across several impact dimensions and then use the score to decide where the asset lands. Whichever method you use, consistency matters more than complexity, because inconsistent criticality labels lead to inconsistent protection. In OT, it is also important to separate corporate importance from site importance, because an asset can be critical to a specific plant even if it is not visible at corporate level. The people closest to the process should have a voice in the ranking, and the method should allow their knowledge to be expressed in a structured way. When criticality is transparent, disagreements can be resolved by discussing consequences rather than arguing about labels.
Criticality assessment should also consider different failure modes, because the same asset can have different impacts depending on what goes wrong. If a system is simply unavailable, the consequence might be downtime, but if a system produces wrong values, the consequence might be unsafe operation or product defects. If an attacker changes logic, the consequence might be subtle manipulation that degrades quality over time, which can be harder to detect and more damaging than a visible outage. If access control fails, the consequence might be unauthorized changes that bypass safety checks. Thinking through failure modes helps you avoid assuming that the worst case is always a shutdown, because in many OT incidents the dangerous scenario is continued operation with degraded integrity. It also helps you plan controls that match the failure, such as emphasizing monitoring and change detection for integrity risks rather than focusing only on availability protections. For beginners, this is a key shift: cybersecurity is not only about keeping things running, it is also about keeping them running correctly and safely. Criticality is the compass that points you toward which integrity failures matter most.
Finally, criticality is only useful if it drives action, and that means connecting it to protection choices and operational discipline. High-criticality assets should typically receive stronger access controls, tighter change management, better monitoring, and more robust recovery preparation than lower-criticality assets. They should also receive clearer ownership, because when something is critical, ambiguity about who approves changes or who responds to issues becomes a risk factor. Criticality can guide maintenance planning, such as prioritizing firmware tracking, configuration backups, and vendor support agreements for the most important systems. It can guide incident response planning, such as defining what to isolate first, what to preserve for recovery, and what needs immediate coordination with safety teams. Most importantly, it can guide communication with leadership by translating technical assets into business and safety consequences that decision-makers understand. When you can say, this set of assets is critical because a failure here could lead to unsafe conditions or days of downtime, you create the case for investment without relying on fear. Determining asset criticality is the discipline of aligning security effort with real-world consequence, and once you do it well, the rest of risk management becomes clearer and more honest.