We recently published a paper on the rediscovery of software vulnerabilities. This was the final version of a paper that had been in the works since September, peer-reviewed by the WEIS community during the winter, and then circulated for additional revision in early March. Since publication, two mistakes have come to light. After evaluating these concerns, we’ve reviewed our data and will reach out to several commenters in order to release a revised version of the paper in the next two weeks. This blog post details our planned changes and the rationale for making them. The only revised data in the paper relates to Chrome. Our conclusions remain largely unchanged: rediscovery takes place more often than previously thought, between 10 and 15 percent of the time for all data sources combined and as high as 23 percent in a single year for Android.
Our first error is that we directly relied on how duplicate bugs were labeled in the Chromium database. Many of these duplicates were automatically generated and submitted to this database by Google’s software testing infrastructure, ClusterFuzz, or were multiple reports from the same person and thus not instances of rediscovery.
To correct this error, we have manually examined all the bugs that were marked as duplicates or given CVE numbers in the stable build of Chome to revise this portion of our dataset. Using just the stable build of Chrome focuses on the most widely used version of the software and removes the majority of automatically generated duplicates. In our first pass, we have coded as a duplicate only those bugs that were submitted by two different parties, including instances where one of those was an internal Google discovery. This results in an aggregate rediscovery rate for just Chrome of between six and nine percent depending on the time window used. Debate will remain over the subjective evaluation of the severity of these bugs and their ability to be exploited, but we chose to use the severity rating originally assigned in the Chromium database as closest to an objective judgment.
This issue is limited to the Chrome portion of the dataset. The Firefox data is not automatically generated in the same fashion and the Mozilla security team manually assigns a severity rating to bugs and submits reports. We obtained the data on Android in collaboration with a member of Google’s Vulnerability Rewards Program who merged and manually reviewed duplicates between a publicly accessible bug database and Google’s internal issue tracker for Android. Initial review of both data sources has found no problems in their reliability.
Our second error was that we overstated a claim with respect to vulnerabilities that the intelligence community might use. Throughout the paper we were careful to state that our findings were applicable to the data we had access to and shouldn’t be taken to apply to all bugs across time. We made a mistake by claiming that the rediscovery rates we found translated to the NSA’s stock of vulnerabilities being responsible for a potential range of real world zero-day vulnerabilities.
This issue sits at the heart of several debates in information security and is similar to the discussion of what constitutes “nation-state” malware, a question which has been debated and studied. Ultimately what drives state’s behavior is a function of operational need, opportunity, and the state of play in politics and attribution. Depending on skill, time, and resources, many less severe bugs can be chained into one serious exploit. Environmental changes can make bugs that are unusable now into critical vulnerabilities in the future. A highly capable actor like the US or Russia could deploy cutting-edge kit today and repurpose something found in the used ransomware bin tomorrow.
As a thought experiment, to demonstrate the significance of a few percentage points difference in rediscovery, we applied our numbers to an estimate of the total NSA stock of vulnerabilities in a given year generated by a group of researchers at Columbia University. Using a conservative estimate, assuming half of the bugs rediscovered would be unusable or found by friendly/neutral parties, we generated a percentage of the 50-250 bugs which would be rediscovered each year. The problem is that there is no systematic description in the public domain of what constitutes a vulnerability that would interest the intelligence community. Thus, any approximation is an educated guess at best. While many of the vulnerabilities contained in this dataset represent critical security issues, exactly what an intelligence agency might be interested is not publicly available, and thus we can’t say for certain. Representing what was, at best, a thought experiment as anything else in the abstract of the paper was the second mistake we made, and we have corrected that.
We are encouraged to see the debate that we and others have stimulated. As a result of our mistakes with respect to the NSA thought experiment and failure to understand the behavior of the Chromium database and its integration with Google’s testing infrastructure, we opened our analysis up to much broader speculation. The aim of our work is to inform the debate and drive the publication of new data and research thus it is our responsibility to get it right.