One element of President Biden’s executive order on cybersecurity establishes a board to investigate major incidents involving government computers in somewhat the way that the National Transportation Safety Board investigates aviation disasters. The two of us, among many others, have been advocating for such a board for many years. The creation of the board is a good first step, possibly as much as can be done without legislative action. But we think that additional action is needed and will magnify the value the board offers.
Section 5 of the order establishes a Cyber Safety Review Board (CSRB) in the Department of Homeland Security. The board “shall review and assess, with respect to significant cyber incidents […] affecting Federal Civilian Executive Branch Information Systems or non-Federal systems, threat activity, vulnerabilities, mitigation activities, and agency responses.” The board will include both government staff and private sector representatives and is charged with protecting information that it collects. The board will convene at the discretion of the president or the secretary of Homeland Security or whenever an entity known as the Cyber Unified Coordination Group is triggered. That group was created during the Obama administration and serves “as the primary method for coordinating between and among Federal agencies in response to a significant cyber incident.”
Under the executive order, the CRSB can investigate incidents that affect civilian agencies and non-federal systems. The order apparently excludes incidents that affect both defense and intelligence agencies. The reviews are similarly broad, and are interestingly framed as reviews, not investigations, but are intended to result in recommendations for improving both practices and policy. Additionally, the order calls for the board to assess if it has the right composition, mission, scope, authorities and sources of information after generating its first report.
We are glad to see this formally mandated reflection on the board’s own work. And more generally, we are excited by the substantive step forward the board represents. We, along with Rob Knake, have been running a National Science Foundation-funded workshop on Adapting Aviation Safety Models to Cybersecurity. We hope to have the workshop report available in time to inform this assessment, and we can also offer up immediate perspective.
But while the board is a significant step forward, it is not all it could be.
The most important missing piece is a clear requirement for public reports on what happened. Right now, the board is only mandated to “provide recommendations to the Secretary of Homeland Security for improving cybersecurity and incident response practices,” in the context of particular incidents. This isn’t enough. One reason that commercial air travel is so incredibly safe is that airlines and airplane manufacturers have learned and continue to learn from detailed accident reports, and have been able to adjust their procedures and their technologies accordingly. If the new CSRB restricts itself to developing an incident-handling playbook, per the executive order, it will be much less valuable than if it publishes its reports regularly. The collected works of the CSRB will be an incredibly valuable resource for system designers for years to come. (We and our co-authors of another piece explain that in more detail.)
The contents and tone of these reports are important. They should be as objective as possible, spelling out observed facts and the conclusions drawn from them. It is not about reverse-engineering injected malware, though that might be of interest to the intelligence community. It is not about developing evidence for use in a prosecution—that’s the FBI’s job— and it’s not about finger-pointing. The reports should be respectful to the people whose decisions didn’t pan out as hoped. Finger-pointing is not just wrong, it’s counterproductive. The NTSB gets this right; NTSB reports cannot be used in any disciplinary or civil court proceedings. This is vital for obtaining the cooperation of parties to the investigation.
It is important also that the board have both the charter and time to do more than review reports by others. Doing an independent investigation, though, is difficult, and requires skilled, full-time staff. A board that is convened only subsequent to an incident can’t possibly investigate it without having a dedicated investigatory staff. Initially, it might be wise to outsource the immediate fact-finding work to existing private sector companies; but, ideally, personnel with this skill set and orientation would be nurtured within a professional board staff. Another aviation safety program, NASA’s Aviation Safety Reporting System, makes use of late career aviation professionals with the experience and perspective to fully understand the near misses reported to them.
Today, the mandate of incident response firms usually centers on recovery: discovering the extent of an intrusion so the attackers can be kicked out. Law enforcement is focused on whom to charge. In many cases, those are valuable goals, but there are other valuable goals which are not being met. These goals include establishing the root causes and contributing factors that preceded a problem, and evaluating the efficacy of both technical and managerial controls and the policies and standards that inform those controls. There are many lessons from other industries’ safety practices, including how people contribute both by making mistakes, but also in positive ways by noticing mistakes, and demonstrating flexibility and resilience in response.
Another helpful lesson from safety professionals is the value of a “five whys” style analysis. “Five whys” analysis is simply taking each response to queries about an incident and asking why. (It’s a little like being around a three year old, but more productive.) For example, someone might say “why did this event happen?” and the response might come back “because we didn’t patch our server.” “Why didn’t we patch our server?” “Because we were tracking patches to apply in a spreadsheet.” “Why were we tracking like that?” “Because we didn’t get a budget for a more advanced system.” “Why didn’t we get a budget…” This approach can help an investigatory board avoid simplistic answers and get to deeper issues or sometimes even the heart of the matter.
Producing a really useful analysis requires an investigation that goes deeper than the obvious, immediate issues, and hence beyond the scope of ordinary incident response. This is where a CSRB can help. A good CSRB report, like a good NTSB report, will independently look at all of the myriad technical and organizational factors that contributed to what happened. It might have been unpatched holes or it might have been a supply chain attack or it might have been lack of encryption or it might have been alarms that were neglected or any of dozens of other factors. It might have been an error by an intern, but those errors are made in systems that allowed those errors to have substantial impact. It might be an easily guessed password, but if so, why did the system allow that, or not require additional authentication? Was the stronger authentication budget request denied because of cost? It is often said that complex systems fail in complex ways; the same is true of the kind of incidents the CSRB should investigate.
It’s also important that the CSRB not be part of a regulatory agency. In that sense, the Department of Homeland Security is not the ideal home for it, just as the NTSB is not part of the Federal Aviation Administration. Sometimes, the NTSB will determine that some FAA regulations are partially culpable in a crash; that would be awkward if the NTSB were part of the FAA. DHS has responsibility for dealing with critical infrastructure sectors; as the CSRB grows, it will likely need to investigate incidents in such companies, too.
In addition to investigating individual incidents, the NTSB also makes recommendations for change based on its accumulated experience. The CSRB should do that as well. Perhaps in several incidents, internal warning systems did not function as desired. Were they deficient? Bypassed? Improperly configured because it was too hard to configure them properly? If there’s a pattern, people need to know.
All this and more is possible. We are satisfied with the initial scope of the CSRB being focused on government systems; that will minimize, for the moment, difficult issues involving proprietary data and personal privacy. As long as the new board delivers useful reports, respectful of victims of crime, it will prove its value and we will work through these complexities. As we do, we hope that its scope will expand to include more major incidents. We will be safer sooner if we learn more from the many incidents to which everyone apparently continues to be vulnerable.