FBI director James Comey has had a rough couple of weeks: First he was accused of rigging the election for Donald Trump when he revealed on October 28 that the FBI was investigating new emails from Hillary Clinton’s campaign, and now he’s accused of rigging it against Trump by revealing today that none of those new emails contained anything that would result in criminal charges. But to hear the Trump campaign tell it, those weeks sound even harder for Comey: They seem to imagine that the FBI director spent the intervening days poring over those hundreds of thousands of emails himself, one by one.
“You can’t review 650,000 emails in eight days,” Trump said Sunday in a campaign speech in Michigan hours after Comey’s latest update to Congress came out. “You can’t do it, folks. Hillary Clinton is guilty.” Trump supporter General Michael Lynn did the math on Twitter:
There R 691,200 seconds in 8 days. DIR Comey has thoroughly reviewed 650,000 emails in 8 days? An email / second? IMPOSSIBLE RT
— General Flynn (@GenFlynn) November 6, 2016
But fortunately for Comey’s eyesight—and for Clinton’s presidential campaign—Trump is wrong: the FBI can review hundreds of thousands of emails in a week, using automated search and filtering tools rather than Lynn’s absurd notion of Comey reading the documents manually. “This is not rocket science,” says Jonathan Zdziarski, a forensics expert who’s consulted for law enforcement and worked as a systems administrator. “Eight days is more than enough time to pull this off in a responsible way.”
One former FBI forensics expert even tells WIRED he’s personally assessed far larger collections of data, far faster. “You can triage a dataset like this in a much shorter amount of time,” says the former agent, who asked to remain anonymous to avoid any political backlash. “We’d routinely collect terabytes of data in a search. I’d know what was important before I left the guy’s house.”
This is not rocket science. Jonathan Zdziarski
In this case in particular, forensics experts say, investigators’ jobs might even be particularly easy: Because the new collection of emails under investigation were taken from the laptop of Anthony Weiner, the husband of Clinton Aide Huma Abedin, only a portion of those emails would be messages sent to or from Clinton or anyone else on the campaign rather than those sent to or from Weiner’s contacts. Simple filtering by “to:” or “from:” could cut out hundreds of thousands of messages.
Next, the agents could filter out duplicate emails from those they’d already analyzed in their months-long investigation earlier this year. According to multiple media reports, the vast majority of emails the FBI examined over the last week were, in fact, duplicates. Those copies could be spotted by their message ID, points out Zdziarski, a unique alphanumeric identifier for each email. Or if any duplicate messages somehow had different message IDs—say, because they had been copied into replies or forwarded—the FBI agents could use a forensics tool like Encase or AccessData Forensics Tool Kit to make cryptographic “hashes” of full messages or chunks of them. That hashing process converts portions of text into shorter character strings that uniquely represent the text: running a hash function on that same text will always produce the same short string of characters, but any tiny change in the text produces a different hash string. And that allows a program to quickly compare and match text samples.
Agent 1: Let’s remove the dupes first
Agent 2: That leaves… wow, three emails.
Agent 1: How many seconds are in eight days of vacation?
— Jonathan Zdziarski (@JZdziarski) November 6, 2016
From there, Zdziarski says, the agents could also sort the emails by thread, allowing them to look at the messages in groups of replies and disregarding dozens of emails at a time if they weren’t about topics of interest. “I could look at it and say ‘this block of 100 messages is all about Podesta’s pot roast recipe, so we’ll ignore all of those,’” Zdziarski says.
In fact, according to the former agent who spoke with WIRED, the FBI has tools to quickly identify indicators of classified documents in a large corpus of data. Zdziarski compares those tools to the software that checks for plagiarism, but instead checks for matches or near-matches in text with a collection of classified material. And the FBI could also search for keywords to prioritize reading any new messages about subjects they’d already pursued in their previous investigation of Clinton’s emails.
The FBI declined WIRED’s request for more information about how it performed its week-long search. But the cybersecurity and forensics community described the task as almost trivial. Asked by City University of New York journalism professor Jeff Jarvis how the NSA would handle the email collection, NSA leaker Edward Snowden summed up his suggested method in a tweet:
@jeffjarvis Drop non-responsive To:/CC:/BCC:, hash both sets, then subtract those that match. Old laptops could do it in minutes-to-hours.
— Edward Snowden (@Snowden) November 7, 2016
The real question, wrote cybersecurity consultant Rob Graham in his blog, isn’t how the FBI managed to conclude its investigation in eight days. It’s how it managed to take so long. “Computer geeks have tools that make searching the emails extremely easy,” wrote Graham. “Given those emails, and a list of known email accounts from Hillary and associates, and a list of other search terms, it would take me only a few hours to reduce the workload from 650,000 emails to only a couple hundred, which a single person can read in less than a day.”
In other words, no, General Flynn, it’s not impossible to read an email in a second. That’s what computers are for.