Developing an Objective, Repeatable Scoring System for a Vulnerability Equities Process
The public release of the Vulnerability Equities Process (VEP) charter by the White House in late 2017 went a long way toward satisfying the public’s curiosity about the secretive, high-profile and contentious process by which the U.S. government decides whether to temporarily withhold or publicly disclose zero-day software vulnerabilities—that is, vulnerabilities for which no patches exist. Just recently, the U.K. government publicly released information about its Equities Process as well.
The U.S. and U.K. charters are similar in the overall structure of the process and the list of criteria they use for determining whether to disclose or retain a vulnerability. In effect, they are comparing and balancing offensive equities with defensive equities. Offensive equities are those that benefit the intelligence community when a vulnerability is temporarily withheld and used for intelligence collection, while defensive equities refers to the benefits individuals and businesses experience from knowing about vulnerabilities and being able to protect their private computers.
The U.S. charter further states that to “the extent possible and practical, determinations to disclose or restrict will be based on repeatable techniques or methodologies that enable benefits and risks to be objectively evaluated by VEP participants.” This raises a question: If the U.S., the U.K. or any other government sought to create an objective framework for decision making, what might that look like? In particular, what questions should be included, how should they influence the outcome and how can one interpret the results?
Below, I describe a process that an organization could use for developing an objective scoring system. This process is based in part on my experience developing a quantitative vulnerability scoring system. Specifically, I’ll sketch a framework that would produce two scores, one reflecting defensive equities and one reflecting offensive equities.
Before I begin, it is worth noting a few important caveats. First, there may be many ways to create an objective and repeatable process, and this is just one of them. Moreover, I’m not suggesting this process should be the only factor used in a vulnerability equities decision: further discussion will be required. Also, complete and accurate information may not be available to help properly answer some questions. This will always be a limitation, but the act of identifying missing information can, itself, improve the process and inspire efforts to collect better data. Besides, perfect information may not be required. In my experience, a usable scoring system often requires only approximate answers to most questions, and so more fidelity is neither necessary nor helpful.
Now, on to the framework.
The First Step one may consider is to engage relevant stakeholders to identify the kinds of questions that would be most useful in assessing defensive and offensive equities. In the U.S. and U.K. charters, these stakeholders are federal government agencies, though other countries may consider involving private-sector representatives.
The act of identifying relevant questions is more of an art than a science, but one could start by looking at the questions listed in the U.S. and U.K. charters. Both charters include the following defensive criteria: the vulnerability severity (i.e., the impact to an information system should the vulnerability be exploited); the business and social impact to consumers, businesses and national infrastructure; and the probabilities that the vendor would produce a patch and that users would apply it. Both charters also include offensive criteria: the operational value of the vulnerability to support intelligence or law enforcement priorities, how reliant these groups are on this particular vulnerability, and the risk that disclosing the vulnerability could reveal intelligence sources or methods.
Beyond the specific list of questions, however, there are important guidelines for determining which questions to use for the defensive and offensive sides, and how to frame them. Questions should be as exhaustive, and mutually exclusive, as possible. That is, they should cover all issues related to each equity position, while not being redundant. For example, the U.S. charter asks, “Where is the product used? How widely is it used?” and “How broad is the range of products or versions affected?” These questions generally speak to the same point and can be collapsed.
Further, each question should relate to an equities decision. Questions like, “Was the vulnerability purchased or developed in house?” may be useful to know and record, but the answer would likely not influence the decision.
The questions should also relate to only one of the equities, not both. For example, there has been public discussion about questions such as, “Would disclosing this vulnerability also cause other vulnerabilities to be discovered?” This question has a useful defensive equity (more vulnerabilities would be revealed and patched), but it also has an offensive one (more vulnerabilities could be used for criminal investigations or national security). And so, while potentially nice to know the answer, this kind of question would be ambiguous for scoring purposes because of how it affects both sides of an equities decision.
The output of this step will be two groups of balanced and focused questions, one group reflecting offensive equity and the other reflecting defensive equity.
The Second Step considers how to weigh and combine the questions. It will be up to the stakeholders to estimate the relative importance of each question on the outcome, where the estimates could be represented in percentages (e.g., 20 percent, 50 percent). One way to start the process is to rank-order the questions from most to least important. There is no need to be exceptionally nuanced with the estimates; that would add unnecessary complexity to what is essentially a qualitative process. The only strict rule is that the sum of estimates for all questions (for each defensive and offensive side) should equal 100 percent (or 1.0). It may also be appropriate to group some questions and collectively weight that group. For example, consider these questions from the U.S. charter:
- How likely is an adversary to discover (and exploit) the vulnerability?
- How widely is the vulnerable product used?
- How severe is the vulnerability?
- Will users actually apply a patch, if it is released?
- Could exploits against the vulnerability be detected?
The first three questions reflect a common characterization of risk (often computed as likelihood * impact) and could therefore be reduced to two questions: “What is the likelihood that an adversary would discover and exploit the vulnerability?” and “What would the severity of the attack be?” Further, the relationship between these questions suggests that if the answer to either of them is zero, then the risk would also be zero. For example, if there were no harmful consequences from exploiting the vulnerability, then the risk would be zero, and the probability of attack would no longer matter. Such a situation may occur, for example, if the vulnerability existed in a software component never used in the U.S. or by allies—such as custom communication software written by a terrorist group, for example.
And so, in the preceding example, the first three questions would combine and reduce to two questions representing technical risk, which stakeholders might estimate would make up 75 percent of the overall defensive equity. The last two questions would then combine to make up the remaining 25 percent.
The Third Step is to determine the range of values that each question will have and assign numbers to those values. Common examples are ordinal scales such as “high, medium, low” or “not likely, likely, very likely.” Again, precision is not necessary, nor encouraged, for this step. All that is required is to differentiate reasonable outcomes for each question. Because of the construction of the weights and scores, the range of values should be between zero and one. For example, the question, “Will users apply a patch, if it is released?” could be reframed in terms of a probability, with values such as very high, high, medium, low, and none, and numerical values of 0.9, 0.7, 0.5, 0.3, and 0.0, respectively.
There is an important usability issue to consider at this stage. While there may be perceived benefit to defining many granular values (which will certainly improve accuracy), this creates a trade-off with usability. The more options created, the more opportunity there will be for variation and confusion among stakeholders. Further, unless there are sufficient data on which to base any judgments, extra options would be unused and, therefore, wasted.
By now, the user will have two equity equations (one defensive and one offensive), written as a linear product of distinct questions, their relative weights, and their numerical representations. The equations can then be normalized to produce a score between, for example, 0 and 10, though the actual range is irrelevant. The point is to simply produce a range of values that can be compared across equations.
Each vulnerability can now be represented according to its defensive and offensive scores. Some testing and modifications may be needed to ensure the equations produce scores that make reasonable sense. The relative scores are what is important, not the absolute scores.
The Fourth Step is to score many vulnerabilities and compare their results. Scoring a sample of historic vulnerabilities on a two-dimensional graph (offensive equity on the x-axis and defensive equity on the y-axis) will be informative. Visually representing the relative positions may reveal patterns of past decision-making, for example, by showing how vulnerabilities of a particular characteristic or set of characteristics were systematically disclosed or restricted.
The exercise may provide a check on the scoring system itself, highlighting areas for revisions or improvements. Indeed, it may uncover a kind of disclosure boundary, below which most vulnerabilities were disclosed and above which most were restricted (or vice versa). It could also identify outliers—vulnerabilities that were scored similarly to others but whose equity decisions differed.
In addition, viewing these past decisions may reveal national security interests, sensitivities or biases in the process. For example, both the U.S. and U.K. equities charters state that they are biased toward disclosing vulnerabilities. While examining a scoring framework would not definitively confirm or refute such a claim, it could help inform the discussion.
Many additional details and intricacies are involved in creating and using this kind of scoring system in an equities decision. And these limitations should not be underestimated. At best, this is an imperfect decision tool, and it should not replace thoughtful deliberation involving domestic, international, economic and diplomatic conditions.
Despite these limitations, this process would create an objective and repeatable method for comparing offensive and defensive equity decisions. In addition, it would provide a clear audit trail documenting past decisions and their justifications. Finally, examining past vulnerabilities on a single chart can uncover potentially surprising patterns, scoring trends over time or other boundaries of decision making.