Last week (which seems almost an age ago) when the NSA telephone call meta-data portion of the NSA disclosures first broke, Ben wondered about how an application could be written that would satisfy Section 215's requirements. As Ben noted, Section 215 only allows an order of disclosure based on "a statement of facts showing that there are reasonable grounds to believe that the tangible things sought are relevant to an authorized investigation." How, he asked, could anyone write such an application for what is, in essence, the metadata for every telephone call that happens inside the United States and every call that occurred between the United States and another country? [And, as an aside, if I had to guess why foreign-to-foreign metadata is not covered it is because the US believes that no FISA order is required for that data ... so it may be collected even without an order.]
Turns out that, at least in my view, writing such an application would be pretty easy. And when I say "easy" here I mean easy in the sense of technical description. I really don't know how a whole database might be construed as a "tangible thing," though I assume there is a legal opinion saying so. What I am talking about is just the question of big data manipulation.
And on that score the science of big data analytics is clear -- large databases are effective in establishing patterns only to the extent they are actually comprehensive. If your argument is that we need to do a social network analysis to find terrorist connections, then you need the entire network to provide the grist for the mill, so to speak. That, almost surely, is what DNI Clapper meant when he said: "The collection is broad in scope because more narrow collection would limit our ability to screen for and identify terrorism-related communications. Acquiring this information allows us to make connections related to terrorist activities over time."
And, so, that brings us to Paul Revere. Readers who want to see how social network analysis can be done from data sets will find most interesting (and amusing) this post by Kieran Healey (a sociology professor at Duke) -- "Using Metadata to find Paul Revere." Healy did a very simple form of matrix analysis using only two factors -- the name of a person and the name of the political clubs he belonged to -- and applied it to the colonist revolutionaries. The names were familiar -- Sam and John Adams -- as were the clubs (the North Party and the Long Room Club, for example). He used data collected from historical records by David Hackett Fisher that might well have been available to the British at the time of the revolution.
The results demonstrate the power of matrix analysis. And, notably, this is only analysis of metadata (who belonged to which clubs) and not at all related to any of the content of what happened inside those clubs.
What he found is quite stunning for those who don't know big data. Perhaps it's a bit of a spoiler to say so (and I urge you, if you are interested, to read the whole paper, which is quite entertaining) but it turns out that the data pop out one man as the lynchpin for a large fraction of the organization of the clubs and the men in Boston -- Paul Revere. And while, in historical retrospect he may not have been THE leader of the revolution, it is pretty clear that he was a significant operative in the revolutionary operations. And with just two fields of data British counter-intelligence of the era might have learned about his significance. [Note, of course, that more fields of data gives even greater granularity and fidelity to the conclusions.]
And that, I think, is the answer to the relevance question. It is quite easy, in fact, to say that the large data set can, with appropriate manipulation, reveal the organizational details of social structures. Terrorist activities are social structures of that sort. To my mind it is pretty clear that there are reasonable grounds to believe that the telephone call metadata data base is relevant to the discovery of that structure and therefore relevant to an investigation of those terrorists. I'm not at all surprised that the FISA Court agreed.
Two final points: I'm being descriptive here, not normative. Just because it's effective and legal doesn't mean it is wise. And, the technique is, of course, value neutral. It can be used to discover links for other types of groups and it can be used in other large data sets. The limits we set are only constraints of law -- the technology is not self-limiting.