Publicly Reported Data Breaches: A Measure of Our Ignorance?

By Andrew J. Grotto, Christos Makridis
Wednesday, July 11, 2018, 9:13 AM

There is a mounting gap between what the headlines say about the costs of cyber insecurity to the U.S. economy and the results of data-driven research on this topic—with negative implications for cybersecurity. Congress should move to narrow the gap by passing a federal law that takes two steps to protect data. First, it should require companies that possess sensitive personal information to publicly disclose when significant breaches of this information occur. Second, the law should also establish across-the-board requirements for companies that own and operate critical infrastructure, such as power plants and water utilities, to notify the authorities when sensitive operational systems are under credible threat from malicious cyber actors. A uniform, comprehensive framework would aid national security and enable executives, investors and policymakers alike to make data-driven investment and policy decisions.

Incidents that make headlines—such as a 2013 breach involving Target, which cost the company $292 million and counting, or the multi-billion-dollar losses suffered in 2017 by victims of the NotPetya attacks perpetrated by the Russian military—convey a popular impression of calamitous costs of malicious cyber incidents.

These impressions are reinforced by statements from leading cyber professionals such as former National Security Agency director Keith Alexander, who has asserted that China’s cyber-enabled economic espionage against U.S. firms has resulted in “the greatest transfer of wealth in history.” His point, shared by many who work on  cyber policy, is that this pilfering of American intellectual property and proprietary business information—and its presumed transfer to would-be competitors in China—presents a strategic threat to the competitiveness of U.S. companies and the U.S. economy writ large.

A growing body of empirical research on the cost of cyber incidents, however, paints a less uniform picture. When it comes to incidents involving breaches of personally identifiable data, for example, estimates are all over the map, with one leading study reporting an average loss of $7.35 million per incident, another study reporting average losses of less than $400,000, and a third study reporting a median loss closer to $200,000.

Worse, academic studies of stock-price performance for companies that suffer malicious cyber incidents show similarly mixed results, including many studies that show no sustained dips at all (e.g., here and here). This is astonishing given the dire warnings about the existential costs of cyber-enabled economic espionage to the U.S. economy.

Of course, there are plenty of discrete examples of companies suffering sustained losses after an incident. One year after its infamous breach, Equifax’s share price is still down 10 percent. As case studies, this and other examples can inform our understanding of how a cyber incident can impact a company’s performance. But since case studies are about specific companies and incidents, analysts must always be cautious about deriving generalized findings. Ultimately, understanding the magnitude of harms caused by malicious cyber incidents will require more data that link observable information about a company with its cyber incidents over time.

Moreover, even the eye-popping costs behind some of the headline-grabbing incidents turn out to be more complicated to assess upon closer examination. Consider the $292 million running price tag borne by Target for its 2013 breach. On the one hand, that is clearly a sizable sum—no business wants to suffer losses of that magnitude.

On the other hand, insurance covered $90 million and the company would have been able to write off some portion of the remaining $202 million as losses for tax purposes. That means the actual loss for Target was substantially less than $292 million. In addition, these costs have been spread over several years—they did not hit the company all at once in a single year.

To put these costs in context with other losses and expenses that many companies face, retailers such as Target routinely lose upwards of 1 percent of their annual revenue to “shrinkage,” which is a euphemism for merchandise theft by customers and employees. For Target, which has more than $70 billion in annual revenue, a 1 percent shrinkage rate would mean a loss of around $700 million each year.

Of course, retailers strive to keep that number as low as possible, but they also know that intrusive measures to combat shrinkage can damage employee morale and ruin the shopping experience for customers. In this sense, retailers regularly deal with these trade-offs, resulting in an “acceptable loss” of merchandise as a cost of doing business.

Managing cyber risk involves a similar set of calculations. A power utility, for example, could largely eliminate cyber risk to grid operations by abandoning the use of computerized industrial control systems (ICS) and reverting to manually controlled mechanical devices and processes. But that would mean giving up the substantial benefits that computerized systems  bring to efficient, integrated grid operations. Effective risk management requires optimizing against these competing interests—and this, in turn, requires data.

Retailers (along with investors and shareholders) can benchmark their shrinkage rates against the reported sector average of 1.44 percent. While data about the costs of cyber incidents may never reach the hundredth-decimal-point precision of retail shrinkage rates, the ongoing  uncertainty about the cost of cyber incidents is itself a source of cyber risk for the nation. In addition to a mountain of empirical and theoretical evidence from economics linking uncertainty with lower investment, research into risk management on floods and earthquakes shows that people make especially costly decisions when the information available to them is confusing.

We believe that a substantial part of the confusion about the costs of cyber incidents can be attributed to significant statistical limitations in the data available to managers, investors and researchers. One common approach to measuring the economic effects of cyber breaches is by examining the performance of companies’ stock returns before and after an incident is publicly reported. These studies have helped inform groundbreaking work by the Council of Economic Advisers on the cost of malicious cyber activity to the U.S. economy.

While these event-study approaches have played an important purpose in the economics and finance literature, they rely on two key assumptions that often prove problematic in studies about cybersecurity. The first assumption is that the timing of an event can be reliably recorded. Pinning down this date in a precise, consistent way is essential to identify the relevant periods of study before and after a cyber incident. Yet there is often a lag between when an incident occurred and when the victim publicly reports it.

Recall the famous breach at Yahoo. While the company initially announced that a breach took place in late 2014, Yahoo did not report it until September 2016. Moreover, Yahoo later reported, in December 2016, that another breach had occurred even earlier, in August 2013. Even though some breaches are easy to spot, many of them are not. Meanwhile, the market moves, and separating this movement from any movement caused by a cyber incident is statistically challenging.

The second key assumption is that investors know the distribution of new risks induced by the event. In other words, what new liabilities does the data breach create for a company, and is the actual scale known? The problem here is that investors frequently are unaware of this distribution. In the case with Yahoo, the full scale of the breaches—that all 3 billion of its accounts were affected—was not made public until October 2017. Moreover, since different information may be more or less sensitive, investors might not be able to fully forecast the impact of the breach on customer goodwill and expectations.

Recent research by one of us (Makridis) and Benjamin Dean shows that these twin challenges make publicly reported cybersecurity breaches unreliable for conducting statistical inference, making it tough for executives, investors and policymakers to know how much cybersecurity investment is needed and when.

Another set of limitations emerges from the use of surveys and interviews to collect data. The leading study referred to above bases its estimate of $7.35 million per incident off of a survey that asked 63 U.S. companies to self-assess the cost of a data breach. While self-reported information can serve as a useful heuristic, taking it at face value is an unreliable way to conduct research because it assumes respondents have both an incentive and sufficient knowledge to report accurate and comparable estimates. These assumptions are problematic because evidence points to the contrary: Respondents do not necessarily share the same incentives to accurately report their losses and, even if they did they would not necessarily use the same methodologies and taxonomies to generate their estimates. Even worse, companies differ in a wide array of ways, so the cost of breaches is likely to depend heavily on context.

The data also suffers from significant gaps in what is being reported. Much of the available data is about breaches of personally identifiable information, such as Social Security numbers or financial data, because all 50 states have laws requiring disclosure. There are also a handful of federal laws covering specific categories of personally identifiable information, such as health-care data. Such breaches obviously hurt the enterprises that suffer the breach and the individuals whose information was compromised, but incidents of this sort are not what Keith Alexander had in mind when he made his assertion about China’s cyber-enabled industrial espionage of U.S. firms.

Remarkably, however, there is little publicly available data on the actual costs of cyber-enabled espionage. Such espionage typically involves theft of intellectual property, not theft of personally identifiable information, so incidents of this sort are not subject to disclosure laws governing data breaches. And while it is true that public companies are subject to guidance by the Securities and Exchange Commission that they disclose “material” cybersecurity risks and incidents to investors, public companies comprise less than 0.1 percent of U.S. businesses, representing a sliver of enterprises facing cyber risks.

(It is also an open question whether this guidance has any teeth. Yahoo, for example, suffered one of the largest data breaches in history and still concluded that the breach was not material. Yahoo’s successor, Altaba, ultimately paid a $35 million fine for its failure to disclose. The SEC updated the guidance earlier this year in significant part due to its experience with Yahoo.)

What we have talked about so far is helpful for thinking about reporting requirements for the bulk of the private sector, but what about for critical infrastructure? The majority of critical infrastructure in the United States is owned and operated by private companies that are increasingly turning to computers to operate this infrastructure. These are obvious targets for foreign adversaries to seek to hold at risk in a conflict, yet here as well data on incidents and their cost is spotty at best.

In the case of the electric grid, for example, an operator must disclose an incident to its regulator, the Federal Energy Regulatory Commission (FERC), only if the incident actually interfered with grid reliability. Such incidents, fortunately, are rare. But any such attack would logically have involved significant preparatory intrusions against the infrastructure in advance, such as installation of backdoors to guarantee future access or even insertion of dormant logic bombs for possible activation later. These preparatory intrusions, according to various studies by security researchers (such as this one), appear to be far more common but not systematically reported. The result is an incomplete picture of the national security threat that China, Iran, Russia and other actors potentially pose to U.S. critical infrastructure. FERC has proposed rules to address this gap, but in the meantime the data as it exists on this strategically significant class of incidents is not nearly as strong as it could—and should—be.

Make no mistake: Cyber incidents sting. The point we are making here, however, is that data on the economic pain that victims suffer is limited and that the varied findings based on this data are often at odds with the anecdote-driven debate about costs in Washington. Measurement is the first step to improvement. Better data would help executives to more effectively manage the cyber risks facing their enterprises, guide investment decisions, enable the insurance industry to develop innovative insurance products, and inform U.S. government efforts to craft proportional and tough responses to cyber incidents perpetrated by foreign adversaries.

Congress can help by enacting a federal disclosure law on data breaches that establishes uniform reporting requirements for organizations that suffer significant breaches of personally identifiable information. There are many benefits to such legislation, which generally enjoys bipartisan support, including a step toward using higher-quality data to understand the consequences of cyber incidents.

Legislation focused solely on breaches of personally identifiable information, however, ignores the strategic significance of breaches of critical infrastructure. Congress should extend FERC’s proposal across all critical infrastructure and ensure that a capable, trustworthy entity within the U.S. government—such as the Department of Homeland Security’s NCCIC/ICS-CERT—receives this data, protects it from unauthorized disclosure, and uses it to inform U.S. offensive and defensive cyber strategies. The data could also be made available for research by vetted outside organizations, such as insurance providers and researchers, in controlled settings. Already, innovations in cryptography, such as secure multiparty computation (SMC), are allowing for researchers ( in genomics for example) to conduct incredible data analysis while keeping more than 99 percent of private information secure. The constraints are not technological—they simply require the political will.