Read the original source document as well.
This is all legal (as in NOT leaked by Edward Snowden) publicly available information that can be very useful to find things online.
The “filetype:” operator is especially helpful for finding PDFs of older books that are public domain.
In case you weren’t already aware, the Internet is home to an unfathomably large amount of information. Seriously – there’s probably at least 100 terabytes taken up just by funny cat GIFs, let alone all the meaningful, educational, enlightening stuff. And with so much information online, even government agencies sometimes need help making sense of it all.
Back in 2007, to help their field agents use the web more effectively, the NSA commissioned a guide to be written, and thanks to a recent FOIA request from MuckRock, the 643 page document was recently released to the public. The guide, entitled Untangling the Web: a Guide to Internet Research, is absolutely stuffed full of great information – or at least it was great information.
The NSA had some pretty slick tricks up their sleeve back in 2007
The book stands as a testament to how quickly the web changes. It was written just six years ago, but has suffered a large degree of technological attrition – many of the websites, tools, and services recommended in it no longer exist.That being said, however, there are still a few things in the book that are worth reading. Taken from a broad standpoint, many of the methods outlined here are still useful – it’s just that the tools used to execute them have changed. The NSA had some pretty slick tricks up their sleeve back in 2007, so I’ve taken the liberty of reproducing them for all you aspiring international super spies out there. Here’s the highlights:
One of the most dastardly chapters in the manual is the section on “Google Hacking,” which the authors describe as “using publicly available search engines to access publicly available information that almost certainly was not intended for public distribution.” Most of the manual is pretty out of date, but this section is just as relevant and useful as it’s ever been. Here’s what the book recommends:
The first part of a good Google hack is knowing how to use Google’s search operators. These are nifty little words and symbols you can append to your queries to get more specific results. Google lists a few of them on their support page, but there are hundreds more that they don’t bother mentioning. It’s worth noting that you don’t need to memorize all of them since the same results can be achieved by using Google’s advanced search options, but that’s like using training wheels to ride a motorcycle. Badass international secret agent hackers don’t use training wheels. In order to be a true James Bond ass motherfucker, you should memorize a couple of these.
The most useful one for devious spy-type activity is undoubtedly the filetype: operator. Using this, you can designate which type of files Google brings up. Here’s a quick briefing on some of the most common ones and what they’ll help you find:
- filetype:xls will return a list of spreadsheets. These often contain personnel data, computer records, and financial information
- filetype:doc or docx is good for internal working documents, reports, etc.
- filetype:pdf is good for large documents of all types, and is widely used in academia, govnerment, and business
- filetype:ppt is good for retrieving briefings, which often contain company or government plans for the future
To maximize the effectiveness of these filetype searches and really start to dig up some dirt, the NSA recommends pairing them with boilerplate keywords. Try using terms like internal, budget, not for distribution, confidential, or company proprietary alongside your searches to pick up stuff that was unintentionally posted online. For example, if you’re looking for, say, classified NSA documents that might’ve been leaked on the web, try filetype:pdf site:nsa.gov “classified.”
Another operator that might come in handy during some good ol’ fashioned espionage is the domain: operator. If used in conjunction with the right top level domain, you can use this operator to restrict results to webpages and documents hosted in specific countries. Let’s say you’re looking for spreadsheets full of passwords to the Russian Ministry of Defense. To point Google in the right direction, try searching filetype:xls domain:ru “password.”
Truth be told, these kinds of hacks were much more effective back in 2007, and nowadays companies and government organizations are pretty good about keeping internal documents off the Web. However, if you apply them in clever ways, these methods can still dig up a few goodies you probably weren’t meant to find.
Untangling The Web has a pretty lengthy section on finding people, and despite the fact that it was written before rise of social networking, it’s still got a good list of tips for finding information on people. That being said, some of the suggestions are more relevant than others, so here’s the abridged version:
- Start by searching by name, address, email address, phone number (any personally identifiable information you have, really) on search engines like Google and Yahoo. This is kind of a no brainer, but it’s always a good place to start.
- If you know the person’s profession, you might find additional info on them in a database that contains stuff like licensing information. The US is really good about licensing people for all kinds of professions. Try other countries for similar information too.
- Property ownership and transactions are carefully recorded in the US and many such records are publicly available. This may also be true in other countries. Look for public databases of these records and transactions.
- If you know where the person works, that organization (be it government, academic, or corporate) might have a publicly accessible directory you can use to look them up
- Whois databases contain information about thousands of people associated with the Internet. If the person you’re looking for has a website, there’s a good chance their info can be found with a Whois lookup. The Whois databases maintained by ARIN, APNIC, AfriNIC, LACNIC, and RIPE are all searchable by name using their advanced search forms.
The authors then go on to mention a boatload of people finder sites, but pay them no heed. They’re all phooey. That was 2007, and we’ve got way better tools at our disposal these days. If you’re looking for a specific person, try searching their name, email address, or phone number on a site like Pipl, 123people, or Spokeo. These sites act as meta search engines, and gather data from a number of public records databases, social meidia profiles, and deep web resources (more on those in a minute).
Geolocating IP addresses
Let’s say you’ve got a name a few possible email addresses, but you can’t seem to track down the location of the high-profile narco-terrorist you’ve been assigned to take out. Not to worry – if you can manage to get your hands on his IP address, then hunting him down will be a piece of cake. Geolocating someone’s IP address is child’s play, and while it won’t give you their exact coordinates on a map, it’s a great tool for figuring a person’s approximate location on the globe. Back in 2007, IP geolocation tools were harder to find, but today they’re a dime a dozen. Just search Google for “IP geolocation” and click around until you find one that suits you. Personally, I prefer InfoSniper simply because it’s got a badass-sounding name and a nice visual interface.
Searching the Deep Web
Google hacking is one thing, but if you can’t seem to find what you’re looking for on the Surface Web, chances are you’ll need to delve into the Deep Web. Also known as the Darknet, the Invisible Web, and similar variations, the Deep Web is basically anything that isn’t indexed by traditional web crawlers. To use Mike Bergman’s explanation, “searching on the Internet today can be compared to dragging a net across the surface of the ocean: a great deal may be caught in the net, but there is a wealth of information that is deep and therefore missed.”
The Deep Web is home to petabytes of information you can’t find on the surface, so you’re far more likely to dig up the dirt on somebody through the Deep Web – the hard part is just knowing where to look. The NSA lays out a short list of Deep Web resources to get you started – the only problem is that nearly all of them are no longer in operation. So, in their wake, we suggest using the following deep web resources:
- CompletePlanet. This site bills itself as “The front door of the Deep Web,” and since it indexes over 70,000 different deep web databases, it’s definitely one of the best tools you have at your disposal
- DeepWebTech offers a set of specialized search engines and browser plugins that crawl deep web databases. The search engines cover science, medicine, and business.
- Scirus is a science-focused deep web portal that pulls information from a vast array of journals, periodicals, e-books, and other resources not traditionally indexed by search engines.
- Infomine, one of the few resources listed in Untangling the Web that’s still up and running, is a fantastic resource for finding scholarly/academic information online.
Covering your Tracks
If you’re plugged in and in the midst of some serious webspionage, the last thing you want is to inadvertently leave traces of your activity. The NSA recommends a few methods for keeping your information secure, but oddly enough doesn’t go into great detail on the subject. The authors suggest things like using anti-spyware software, encrypting communications, and using strong passwords – pretty basic stuff. Not to worry though. We’ve put together an excellent introduction to staying anonymous online, which includes a host of programs and services that’ll keep your information hidden from prying eyes