Bulk Metadata Collection

What Is the "Right" Number of Call Detail Records for 42 Targets under FISA's Business Records Authority?

By Robert Chesney
Wednesday, May 3, 2017, 5:45 PM

ODNI's transparency report contains loads of interesting information (see, e.g., Adam's post here on FBI queries of the fruits of 702 collection).  In this post, I'd like to draw attention to the statistics on use of the FISA Business Records authority, 50 USC 1861.  

1) Non-telephone communications metadata

The report first addresses the collection of information other than telephone call metadata.  What would that include?  The report gives the example of analogous information about email communications. and reports that there were 88 separate targets for collection under this heading, resulting in the production of 81,035 "unique identifiers."  Meaning what, exactly?  It seems to mean that each target was emailing or messaging etc. with an average of 900-100 others addresses/handles during 2016.  That's just a one-hop slice, you will notice, and it's not a very large number compared to an ordinary user of email for social or business purposes might produce.

2) Telephone communications metadata—What is the "right" number of Call Detail Records for 42 targets?

The more familiar historical use of the Business Records authority, of course, is to acquire telephone metadata (who called whom, when, and for how long).  Under the USA Freedom Act, the government no longer may acquire such data in bulk, but must instead make individual applications to the FISC (which then result in the production from service providers of two-hops' worth of metadata).  We are told here that the government obtained orders for production of this kind on 42 separate targets in 2016, and that this in turn resulted in production of 151,230,968 records. 

At first blush, that seems like a huge number.  But consider the following example: [Note: I've left my original hypothetical in place below, but please read past it to the further bracketed note, which amends the analysis to reflect some excellent feedback/corrections I received]

First, a given target may have multiple phone numbers.  I have four (cell, home, Law School office, Strauss Center office).  But let's say the average is just two.

Second, let's say that the average number of calls in and out from each phone each day is, oh, five.  So now we're up to ten total calls.  Some may involve repeat parties, but the data doesn't control for that so the number remains 10.

Third, let's stretch that across the full year.  Now we are up to 3650 calls.

Fourth, let's account for the fact that a single call will appear twice if the two parties have different service providers.  Let's take 3650, then, and make it 5000 calls.

Fifth, let's account for the fact that this is just the first-hop, and that the USA Freedom Act allows for collection out to the second hop.  So, the same cycle and assumption plays out for all those 5000 contacts.  That's 5000 x 5000, or 25,000,000 calls.

Sixth, let's account for the fact that there were 42 targets, not just one.  So, we have 42 x 25,000,000, which yields 1,050,000,000.

That analysis may of course reflect bad assumptions, and it may miss all sorts of nuance about how the data gets filtered.  But even if we make a dramatic adjustment, moving a whole decimal place, you end up with 105,000,000, which is not far off from the reported total of 151,230,968.  This suggests that the superficially-shocking number that ODNI reported may be right on track for what reasonably would be expected for a set of 42 targets.  [Well, so much for my unduly-rapid attempt to dash out this analysis.  Thanks to some helpful readers, there are three alterations (at least) needed to this analysis.  

First, the universe of records could go back for a considerable period for each target (and, at the second hop, for each person in touch with the target).  This would have an indeterminate-but-expansive impact on the number of CDRs returned per target.  My analysis undercounted in that respect.

On the other hand, I sloppily overcounted in two other respects: (a) at my "step four" I accounted for the double-counting of CDRs where two different service providers are in issue, but this should have been done only after reaching the second hop; (b) at my "step five," I treated the set of second hop contacts as if it must be the same number as the number of CDRs at the first hop, but actually it likely would be (much?) narrower since that CDR set would include a substantial number of calls between the same numbers.  

So, let's again be conservative and cut the second hop number by half, and let's just forget about rounding up any numbers to account for the double-counting when two-different providers are involved.  This gives us a second-hop population of around 1800, each generating the same presumed set of 3650 CDRs, and that in turn works out to about 6,570,000 CDRs.  Now, multiply by 42, and you get 275,940,000 CDRs.  

Earlier, I tried to add in an over-the-top, further conservative element by moving an entire decimal place, showing that you'd still end up with something similar in scope to the reported number.  That no longer works, obviously, but note that you can cut the overall CDR estimate I just gave by a full 1/3, and still end up close to the reported number.  

In short, I think my fundamental point still stands:  The reported number of CDRs for 42 targets looks much like what one should have expected.]