Internet Metadata Collection

The NRC's Bulk Collection Report: a High-Level Overview

By Andy Wang
Tuesday, January 20, 2015, 3:00 PM

Last week, Wells noted the release of an important, 85-page report by the National Research Council. (Yesterday, Herb Lin added his thoughts about it.) Broadly, Bulk Collection of Signals Intelligence: Technical Options concludes that right now, there are no software-based techniques that could fully replace the bulk collection of data. Below, I offer a high-level, general overview of this and other main takeaways from the report.

Background: In January 2014, the White House released Presidential Policy Directive 28 (PPD-28).  Among other things, the latter requested the Director of National Intelligence (DNI) to “assess the feasibility of creating software that would allow the [intelligence community] more easily to conduct targeted information acquisition [of signals intelligence] rather than bulk collection.” In turn, the DNI sourced this task to the National Academies. Last week the National Research Council (NRC)---a unit of the National Academies---released the report.

It begins with nomenclature, providing context and definitions to technical terms and jargon. First is “bulk collection”, or data collection of which a significant portion of the data collected is not associated with current targets; the report naturally then defines targeted collection as anything that is not bulk collection. Notably, in adopting this terminology, the NRC departs from PPD-28, which defined bulk collection as “the authorized collection of large quantities of signals intelligence data which, due to technical or operational considerations, is acquired without the use of discriminants.”

Next is a review of how NSA obtains and breaks down signals intelligence. This proceeds in essentially six steps, according to NRC: First, the NSA takes in signals; second, the NSA extracts the data relevant to certain events; third, the data is filtered according to one or more discriminants (e.g. specific identifies or terms); fourth, the NSA stores the filtered data; fifth, the NSA then analyzes the data, by querying it; and, sixth and finally, the NSA disseminates the material to other analysts and policymakers. The first four steps comprise the “collection” process.

The study is limited in scope. Its principal conclusions, it says, derive from an examination of three categories of “use cases”---that is, real life scenarios demonstrating how bulk data is analyzed. The first is contact chaining, which “traces the network of people associated with a target by following links of the form ‘A communicated with B’ starting at the target and traversing chains of one or more links.” The second is alternate identifier techniques that “seek to keep current the set of identifiers that a target person is known to be using.” The third is triage, which “starts with a list of identifies of interest and categorizes the urgency of the threat to national security from the party associated with each one.”

Key terms defined, context supplied, and scope established, the NRC then turns to the main benefit to using---and the main drawback to not using---bulk collection. Here the NRC proceeds from a somewhat obvious but still valid premise. Should past events or facts become relevant due to changing circumstances (for example, if a historically nonnuclear state opts to pursue nuclear weapons unexpectedly) then those past events or facts will be available for analysis only if they have been collected beforehand. Once innocuous data thus can start to matter, but one can never tell when it will or won’t. This seemingly unassailable fact brings us to the punch line: Restricting bulk collection will necessarily make intelligence less effective, as doing so necessarily will mean failing to capture at least some information that might prove important on a later date. And alas the NRC notes, that is not a problem that technology or software can mitigate.

With this difficulty in mind, the key question---to which the report then turns---is how the use of software might replace or reform bulk collection. Though the IC already uses a mixture of manual and automatic controls on the usage of collected information, a new regime of “automated controls and audits [would] require expressing, in software, the rules embodied in laws policies, regulations and directives that constrain how intelligence is collected, analyzed, and disseminated.” That’s a tall order. These days the rules governing surveillance are complicated, to say the least, maybe even contradictory in places.  As such, a software-based system would have to do the work of scores of surveillance lawyers. And that is no easy task.  Nonetheless, the report notes that software could at least help to mitigate some concerns associated with bulk data collection, and in at least three ways. First, automation can isolate bulk data, so that the information is cut off from the outside world. Software also can automatically restrict queries when certain parameters are not met. Third, software can audit the usage of bulk data.

From here, the report moves to its three main conclusions.

First is the headline-grabber, noted above: Right now, no software technique will fully substitute for bulk collection where it is relied on to answer queries about the past. after new targets become known later on. Again, this is no small shortcoming, as the greatest benefit to bulk collection is that it enables NSA to query past signals intelligence for use in subsequent investigations. The report isn’t categorical here, however; instead it notes that other sources of information---like that collected by communications service providers---might provide a partial substitute for bulk collection in some cases. (As readers well know, in the wake of Snowden’s revelations about the NSA’s call records program, for example, it was proposed that telecommunications companies might retain customers’ data, for collection on a targeted, ad hoc basis by the government.)  Yet even that will still call for ongoing retention of potentially relevant information, for use down the road.  Tweaks in targeting procedures might change that over time, according to the report: “It may be possible to improve targeted collection to the point where it provides a viable substitute for bulk collection in at least some cases[.]”

Second, the report concludes that even if software cannot fully substitute for bulk collection, it can at least place controls on the usage of bulk data to help enforce privacy protections. Though software might not be able to replace human legal review outright, it nevertheless might be designed so as to capture some of the privacy rules governing bulk data collection and access. Additionally, software can improve transparency and public faith in the integrity of the process. For instance, audits can be performed by software, producing consistent results that can be verified---an improvement over inconsistent and sometimes opaque manual auditing. The report adds that automation may be easier to deploy if it is technology-neutral---or designed so as to ensure adaptability, and not tied to rapidly changing information and communications technology.

Further research and development could be a big help too. The report says in particular, that developing analytics might assist in the process of excluding irrelevant information and identifying potentially relevant information---to the point that bulk data collection even might be rendered unnecessary. Additionally, more powerful software and automation could improve the precision, robustness, efficiency, and transparency of usage controls.