Cyber & Technology

Ten Questions We Hope the Cyber Safety Review Board Answers—and Three It Should Ignore

By Steven M. Bellovin, Adam Shostack, Tarah Wheeler
Wednesday, February 9, 2022, 9:10 AM

Congratulations to the newly stood up Cyber Safety Review Board (CSRB)! We’ve been fans of the idea for a very long time and are hopeful that the new board follows in the best tradition of independent investigation, giving everyone a new perspective on the what, how and why of incidents, and shining a light on a path forward. The CSRB was created in the Department of Homeland Security in May 2021 by Executive Order 14028, “Improving the Nation’s Cybersecurity.” It is the first independent review board for cybersecurity. The board has been tasked with investigating incidents affecting U.S. government systems. We also want to express our respect for the members of the board and thank them for their service.

There are questions that the public should always want answered. But the form of those answers may vary based on what’s being investigated, so here we outline some principle-driven guiding questions to help the newly created CSRB. Then, we show how the questions might apply to either log4j or SolarWinds. Log4j is a popular open source library, and in December 2021 a set of security flaws impacting many different versions of the library were identified. Because the library was incorporated into both commercial and locally developed software, and because more vulnerabilities kept being discovered, the issues were complex to manage. Log4j has been announced as the first subject for the board. 

Guiding Questions for the CSRB

These questions are focused on the facts. The inquiries include assessing how the issue or attack was detected or discovered, how information was (or wasn’t) surfaced or shared, and the lessons to be learned.

1. What happened? 

The answers seem somewhat obvious today—after all, there are lots of news stories scrolling by. The first draft of history is captured in blog posts, tweets and even news stories behind paywalls. Collecting and organizing those primary sources, summarizing what they told industry and the public/researchers, and providing an authoritative history in two timelines will enable future research. We’d like to see the published report include both timelines of what happened and when various facts became public.

2. Why was this missed? 

What sort of failures happened? Was each failure one of science (defenders don’t have tools that are capable of finding the things they did), one of engineering (the tools didn’t work as expected), of organizational dynamics (the tools were too expensive or otherwise hard to deploy), of usability (there were too many alerts), or of response capability (no one followed up on or analyzed the right reports)? Did anyone notice a near miss that could have exposed the problem before it became so widespread? Were there controls that provided unexpected value? Were there failures of coordination (no one could see the puzzle pieces)? Either within the government, in public-private partnerships, or elsewhere?

3. How were the attackers caught?

4. What did information sharing add?

Were the various information sharing and analysis centers/organizations (ISACs/ISAOs) and cyber emergency response teams (CERTs) useful, and by what measures? 

5. Did defenders have logs that went back far enough to help them?

6. What were the more subtle indications of a penetration, like file access times? Were these intact, or was there evidence of alteration?

7. What actions by the defenders helped them regain control of their networks more quickly?

Of the ones who did so quickly, what commonalities existed? Of the ones who did so slowly, what did they have in common? Are there controls that had unexpected value?

8. What lessons were learned?

What lessons are there for software developers (commercial or open source), cloud platforms, systems operators (either commercial or government), regulators or policymakers? Did the victims in any specific industry or sector do better or worse in detection or response, and can that inform regulators or shape future regulation? 

9. What commonplace lessons are either not supported or challenged by these events?

10. What other attacks, both online and off, are similar to this particular attack—in terms of its vector and the people, countries, and actors involved—and could have been lessons for this incident? 

For example, the United States ignored reports of the Soviet’s penetrating the U.S. aviation system’s intellectual property protections in the run-up to World War II due to a larger threat from Germany—it was very similar in impact to SolarWinds and yet got yawns or furtive ignoring from Congress. This will help policymakers and businesspeople understand how to place this event in history and give it a relative weight. The alternative is to treat the incident report as if it were written by JRR Tolkien, a fantasy, not something with relevance to “real life.”

Bonus Question: What’s in the classified appendices, and why?

We expect that something will be classified, and won’t rehash the question of over-classification here, but the board should have a predisposition toward publication, and discuss the judgment it has made.

Questions to Ignore

All of these questions may involve classified sources and methods. Some may involve “law enforcement sensitive” information, and as interested as we are, we think the board will do better to dig elsewhere so that it can publish its report in as close to full form as possible. There are questions that seem fascinating, but we hope that the CSRB does not take them on.

Who launched the attack? Determining who launched an attack requires a very different investigation using different skills and background knowledge. Saying “the attackers used a stolen password for initial entry, since there was no two-factor authentication in place” is not the same as “Ruritanian intelligence suborned that person’s partner to gain access to the password.” Attribution also involves knowledge of the history of each group and often many nontechnical sources of information.

Why? This inquiry is even more problematic than the “who” question. If it’s a criminal gang, the obvious answer is “money.” If, however, it was a nation-state, answering this question requires a full view of all available intelligence information about the apparent perpetrator. More seriously, it doesn’t help sites to protect themselves, and that’s the primary goal of the CSRB.

What did it cost? The cost of a penetration depends heavily on the ability to reuse existing exploits and tools, and prior knowledge of the target environment. Ascertaining much of that will draw on previous history with the attacker; this may draw on knowledge of non-public attacks.

Case Studies: Making the Questions More Specific

We now turn our attention to applying these questions to two specific cases: log4j and SolarWinds. As we do, we’ll note that there were a set of vulnerabilities associated with log4j, which came out over a few weeks. In the case of a pandemic, the World Health Organization has taken over the task of naming: The omicron variant is named “omicron” because the WHO assigned that name. We’re sure the fine folks at SolarWinds Inc. would prefer to have their name go back to only referring to a software company and that some other label be applied to the unfortunate events in which their software was a part.

The Log4j-Specific Questions

1. What happened in log4j?

How did the various issues get introduced? What review steps took place both at the project, and at various companies that relied on it? There are claims that the initial hole was reported or exploited in Minecraft before it became widely known. Are those accurate? Did the fact that Minecraft, a popular game among children, was on the early list of affected entities increase any urgency or FUD (fear, uncertainty, doubt) around the reporting of the incident, and did that affect how the investigation happened? In this case, the exploits were easy to write and modify. How many variants were there? There’s a belief that the attacks were widespread. Is that grounded? What does “widespread” mean, and how can it be put in context?

2. Why was this missed?

Many organizations talk about their SDLC (security development lifecycle) processes: Do those ignore or deliberately exclude imported code or inherited code from open source libraries? What about analysis tools such as static or dynamic analysis?

3. How were the attackers caught?

How many attackers? How did defenses hold up in the face of these mutations? What’s happened in log4j may be ongoing, but some organizations may have made a call about spending energy and money tracking down attackers. How many have decided it’s not worth the effort? How many are trying hard? And how can we move from “trying hard” to objective criteria? 

4. What did information sharing add?

Was there useful private sharing, or was almost everything public? Was there central work to organize the information—which was flowing freely in public—and to collect, evaluate, validate and share back the lessons learned, or did that work have to be replicated across many organizations? 

5. Did defenders have logs that went back far enough to help them?

In log4j, publicity may have solved this question, but what’s been done to see if there was quiet exploitation of any of the issues earlier?

6. What were the more subtle indications of a penetration?

Many of the publicly shared exploit techniques were added to detection tools. How frequently did investigations rely solely on those? What did those investigators who went further discover?

The SolarWinds-Specific Questions

1. What happened in SolarWinds?

What were the steps in the penetration? In lateral movement? What controls or defenses weren’t strong enough? Which controls seemed to work but without impacting the outcome in the way that might have been expected?

2. Why was this missed?

The attackers got access to some of the best defended networks in the world, and many of the operators have performed retrospectives. It’s also widely stated that the attackers used “advanced techniques.” Can the Board explain what those were, and how that relates to detective tools and techniques?  Did the defenders have a chance of success given the tactics used?

3. How were the attackers caught?

We know a part of the SolarWinds story—someone noticed something odd, and Mandiant swung into action. What did they do? What training did they have? Did any early responders face negative consequences for raising alarms?

4. What did information sharing add?

The first public discussion seems to have been a blog post—were the private information-sharing channels used? If so, how, and what measures of effectiveness were used?

5. Did defenders have logs that went back far enough to help them?

6. What were the more subtle indications of a penetration?

In conclusion, we're glad that the Board has been created. Now, the hard work begins.