Episode 43 — Produce OT Documentation That Works: Policies, Processes, Standards, and SOPs
In this episode, we’re going to talk about documentation in OT security in a way that respects what people actually do on the floor and in the control room. Documentation can sound boring, and beginners sometimes picture it as a binder that sits on a shelf to satisfy an audit, but in operational technology, good documentation is closer to a map and a set of shared rules that keep complex work coordinated. When systems are stable, documentation can feel optional, but when something changes, fails, or behaves strangely, documentation is often the only way people can respond quickly without guessing. The challenge is that OT environments are busy, and poorly written documents can create confusion, slowdowns, and workarounds that increase risk. So the goal is not more documents, it is better documents that people can follow under pressure. We will focus on four major kinds of documentation: policies, processes, standards, and Standard Operating Procedures (S O P s), and we will treat them as tools that help people do the right thing consistently without chaos.
Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.
A practical starting point is understanding why OT documentation fails so often, because if you know the failure patterns, you can avoid repeating them. One common problem is that documents are written in general language that could apply to any office network, which makes OT teams immediately distrust them because they do not reflect real constraints. Another problem is that documents are written for compliance rather than for use, so they are full of vague phrases like ensure systems are secure without explaining what secure means in this environment. Documentation also fails when it is too long, too hard to find, or too hard to interpret quickly, especially during incidents or maintenance windows. Sometimes documents fail because they are technically correct but operationally impossible, like demanding changes that would break vendor support agreements or require downtime that the business cannot tolerate. Another failure pattern is inconsistency, where different documents contradict each other, which trains people to ignore them. When documentation fails, people do what they can to get the job done, and those workarounds become invisible risk. Good OT documentation is written to be used, not admired.
To make documentation that works, you need a clear hierarchy, because not all documents serve the same purpose. A policy is the high-level statement of intent and rules, describing what the organization expects and why, such as requiring controlled remote access or requiring that critical assets have recoverable configurations. A process is the repeatable set of steps that turns the policy into an action flow, like how access requests are reviewed, approved, and revoked across teams. A standard is the specific rule set that defines how something must be implemented or configured within allowed boundaries, like minimum authentication requirements for remote access paths or logging expectations for critical zones. A Standard Operating Procedure (S O P s) is the practical, step-focused guide for how a specific task is performed in a specific context, often by a specific role, like how to request a vendor remote session during a maintenance window or how to record a controller logic change in the change management system. The hierarchy matters because if you try to cram details into a policy, the policy becomes unreadable, and if you keep standards vague, you cannot enforce consistency. When the hierarchy is clear, people know where to look and what level of detail to expect.
Policies are the easiest to get wrong because people treat them like a wish list instead of a commitment. A working OT security policy should be short enough that leaders can read it, and clear enough that teams can tell what is required and what is not. It should define scope, like which OT environments it applies to, and it should define ownership, like who is accountable for approving exceptions. It should also connect to business and safety goals so it does not feel like an external demand, which helps with adoption. A good policy avoids overly technical jargon, because the policy is meant to be stable over time even as technology changes. It should also avoid absolute statements that are unrealistic, like patch everything within seven days, unless the organization can truly do that safely. Instead, it should describe the principle, such as critical vulnerabilities must be triaged and mitigated within defined windows based on asset criticality, and then push the timing details into standards or processes. When policy language is realistic and aligned with OT constraints, it becomes a foundation rather than a source of conflict.
Processes are where documentation becomes operational, and they need to be written with the reality of OT workflows in mind. A process should clearly state who does what, when, and what triggers the next step, because ambiguity creates delays and unsafe improvisation. For example, if the policy says remote access must be controlled, the process should define how a request is submitted, who approves it, how access is provisioned, how sessions are monitored, and how access is removed when the work ends. It should also define what happens when things do not go smoothly, such as when a request is urgent or when a vendor cannot meet a standard requirement. OT processes should include coordination points with safety and operations where appropriate, because a technically correct action can be unsafe if it is not coordinated. Another important process characteristic is that it should be auditable without being bureaucratic, meaning it leaves evidence of decisions while still being usable. When processes are clear and practical, people follow them because they help work happen predictably. When processes are vague, people treat them as suggestions, and security becomes inconsistent.
Standards are the part of documentation that makes consistency possible across sites, systems, and teams. A standard should specify minimum requirements in a way that can be checked, such as requiring unique accounts for privileged access or requiring network segmentation boundaries for critical zones. Standards are where you define acceptable options, because OT often has multiple valid ways to meet a requirement depending on vendor constraints and system design. A good standard distinguishes between must and should, and it defines what requires an exception and how that exception is documented. It also defines how to handle legacy systems that cannot meet modern expectations, because in OT you often inherit equipment that will remain in service for years. This is where standards can reduce chaos, because instead of every engineer making their own decision, the standard gives a shared baseline. Standards should also be reviewed regularly, because what was reasonable five years ago might be inadequate today, and what was once impossible might become feasible as operations evolve. When standards are written well, they allow security to scale without becoming personal or argumentative.
Standard Operating Procedures (S O P s) are where documentation becomes most directly usable by the people doing the work. An S O P s should describe a task with enough clarity that a trained person can execute it consistently, even if they are under pressure or working at odd hours. In OT security, many of the most important tasks happen during maintenance windows, incident response, or vendor engagements, which are exactly the times when memory and improvisation fail. An S O P s can reduce risk by making sure key steps are not skipped, like verifying approvals, capturing baseline configurations, confirming communication channels, and recording what changed. It should also define stop conditions, meaning when the person should pause and escalate instead of continuing, because that prevents unsafe actions when something unexpected occurs. S O P s are also where you can embed local knowledge, like which systems require special coordination or which steps must be performed in a certain order for safety. Beginners should understand that S O P s are not meant to replace training, but to support it, especially when fatigue, urgency, or complexity make mistakes more likely. When S O P s are respected, they reduce both cyber risk and operational stress.
One way to make all of these documents work better is to build them around real scenarios and real decisions, rather than around abstract requirements. For example, instead of a vague statement about least privilege, you can document how access is granted for a specific kind of vendor support session, how the session is time-limited, and how the account is removed afterward. Instead of a broad requirement for logging, you can document which events matter most in OT, like configuration changes on controllers or remote session starts, and how those events are reviewed. The goal is to connect documentation to moments that people recognize, because people remember what fits their experience. This does not mean you need to write story-like documents, but it does mean you should test your documents against realistic situations and ask, would a person know what to do next. When documentation is scenario-informed, it becomes a practical guide rather than a legal text. It also becomes easier to spot gaps, because you can ask what happens if this step fails or if the vendor cannot comply. That kind of testing turns documentation into an operational asset.
Another key to working documentation is clarity about roles, approvals, and exceptions, because OT environments often involve multiple teams and external partners. If a document says the security team must approve changes, but in practice the engineering team makes changes during a maintenance window, the document will be ignored. A better approach is to document actual responsibility lines, such as engineering owns technical changes, operations owns operational coordination, safety owns safety sign-off where required, and security owns control requirements and oversight. Exceptions are especially important in OT because legacy systems, vendor limitations, and production constraints often prevent ideal controls. If exceptions are not documented, they become hidden risk and a source of frustration, because the rules appear absolute while reality is not. Working documentation defines how exceptions are requested, who can approve them, how long they last, and what compensating controls are required. This makes the environment more honest because it acknowledges constraints while still managing risk. It also prevents chaos by keeping exceptions from becoming permanent by default.
Documentation also needs a maintenance plan, because outdated documents are worse than no documents in some cases. If a procedure describes steps that no longer match the system, people will waste time, make wrong assumptions, or create unsafe workarounds. In OT, where systems can stay in service for long periods, it is easy for documentation to drift as small changes accumulate. A working approach is to tie document updates to change management, meaning when a significant change happens, the relevant documents are reviewed and updated as part of completion criteria. Another approach is periodic review, but periodic review often fails if it is not owned and scheduled. Ownership matters here, because a document without an owner becomes a stale artifact that nobody feels responsible for. You also want version control and clear effective dates so people can tell what is current, especially across multiple sites. For beginners, the key lesson is that documentation is part of the system, and like any system, it needs upkeep. When documentation maintenance is built into normal workflows, it becomes sustainable instead of a panic-driven cleanup before an audit.
A beginner might ask how to make documents usable without making them too detailed, and the answer is to separate the stable from the variable. Policies should describe stable expectations, like controlled access and documented changes, without tying them to a specific tool or interface that might change. Standards should describe stable minimum requirements and accepted options, and they should be written so that technical teams can interpret them consistently. S O P s can include more detail, but they should focus on the decision points and checks that prevent mistakes, not on every minor interface action that might change with an update. Processes should focus on coordination and accountability, because those are stable even when tools evolve. This separation keeps documents from becoming fragile, because when a tool changes, you update the relevant S O P s rather than rewriting the policy. It also reduces chaos because people are not forced to relearn the entire system of documentation when a single part changes. A well-structured document set is like a building with strong foundations and replaceable components. The foundation stays stable while the details adapt.
Finally, the test of OT documentation that works is whether it helps people act correctly when time is short and consequences are real. Good documents reduce uncertainty by defining expectations, making responsibilities clear, and guiding behavior through realistic constraints. They also reduce conflict by creating shared language between security, engineering, and operations, so teams can disagree about tradeoffs without arguing about basic rules. When documentation is aligned, you can trace a line from policy to process to standard to S O P s, and each level supports the next without contradiction. When documentation is practical, it becomes part of everyday work, not an extra task that people resent. Over time, working documentation builds confidence because people know what good looks like, and they know how to proceed when something unexpected happens. Producing OT documentation that works is therefore not about writing more, it is about writing what people will actually use, maintaining it as systems evolve, and treating it as a safety and resilience tool as much as a compliance artifact. That is how you get competence and consistency without creating chaos through paperwork.