Sunday, December 30, 2012

Automated Open Source Intelligence (OSINT) Using APIs

Introduction

The first step to performing any successful security engagement is reconnaissance. How much information one is able to enumerate about given personnel (for social engineering engagements) or systems can often impact the effectiveness of the engagement. In this post, we will discuss what Open Source Intelligence (OSINT) is and why it takes so much time, as well as ways we can use various application programming interfaces (APIs) to automate much of this process for us. Hopefully this post will help shed light on the importance of proper privacy settings, and the threat of automated information gathering due to APIs.

Table of Contents

This this blog covers quite a bit of information, I thought it might be handy to have a short outline/table of contents to for those who may find it useful. For the sake of brevity, in this post I will only be covering APIs for finding information about individuals (as opposed to information about systems and networks).

 What is Open Source Intelligence?

The process of gathering information from publicly available sources is known as Open Source Intelligence (OSINT). Publicly available sources can be anything from websites to WHOIS information to published court documents, etc. Also, the information we are looking for can simply be anything we want. From names and positions of company employees, to subdomain information and web server versions in use - it's all fair game.

Why We Should Try to Automate the Process

Since there are so many sources of information, it can often be overwhelming to try and manage the information gathered about a person or company. Also, this process can take a large amount of time if only manual techniques are used. Fortunately, many sites have APIs that make this process easier for us by returning the results in a very manageable JSON format. Let's take a look at a few social networking APIs now.

Facebook Open Graph API and Batch Requests

Facebook unveiled its Graph API in 2010 as a way to help streamline access to information. From an OSINT point of view, this API allows a social engineer to quickly and easily search for user profiles and public posts. This functionality is provided by the "search" feature. This feature allows us to search for public profiles, posts, events, groups, and more based on a given keyword. An example URL might look like the following:

 https://graph.facebook.com/search?q=mark&type=user&access_token=[access_token]  

Here, we can see that our keyword is "mark" and we are searching for "user" results, which will return a list of public profiles. In this case, the results look like the following:

 {  
   "data": [  
    {  
      "name": "Mark Zuckerberg",  
      "id": "4"  
    },  
    {  
      "name": "Mark Hoppus",  
      "id": "100000422852575"  
    },  
    {  
      "name": "Mark Cuban",  
      "id": "502351381"  
    },  
    {  
      "name": "Mark",  
      "id": "100004423320328"  
    },  
    {  
      "name": "Mark Milian",  
      "id": "5724374"  
    }
    <snip>
   ]  
 }  

While this format may look a bit unfamiliar to some, it's actually very convenient and easy to work with. Before continuing coverage of the API's features, let's look at how we can easily obtain and access this data using Python.

 >>> import requests  
 >>> import json  
 >>> # Let's access the API to get our data  
 >>> response = requests.get('https://graph.facebook.com/search?q=mark&type=user&access_token=[access_token]').text  
 >>>  
 >>> # We now have the data in our 'response' variable  
 >>> # We can load this as JSON data  
 >>> data = json.loads(response)  
 >>> # We can now access the data as a 'dictionary' object  
 >>> print data['data']  
 [{u'name': u'Mark Zuckerberg', u'id': u'4'}, {u'name': u'Mark Hoppus', u'id': u'100000422852575'}, {u'name': u'Mark Cuban', u'id': u'502351381'}, {u'name': u'Mark', u'i  
 d': u'100004423320328'}, {u'name': u'Mark Milian', u'id': u'5724374'}, {u'name': u'Mark Pacaco', u'id': u'100001468406166'}, {u'name': u'Mark Jansen', u'id': u'10000108  
 5421252'}, {u'name': u'Mark Badidou', u'id': u'100000422896943'}, {u'name': u'Mark Biem', u'id': u'1114864921'}, {u'name': u'Mark Kaganovich', u'id': u'28'}, {u'name':  
 u'Mark Glaser', u'id': u'500099119'}, {u'name': u'Mark Campos', u'id': u'100002666344195'}, {u'name': u'Mark Grabban', u'id': u'501001999'}, {u'name': u'Mark Rodrigues'  
 , u'id': u'100002480643839'}, {u'name': u'Mark Tremonti', u'id': u'100000004617180'}, {u'name': u'Mark Slee', u'id': u'204686'}, {u'name': u'BJ Mark', u'id': u'10000020  
 3662997'}, {u'name': u'Mark Lee', u'id': u'1712631895'}, {u'name': u'Mark Burn', u'id': u'100000617924417'}, {u'name': u'Mark Mendoza', u'id': u'100002626120094'}, {u'n  
 ame': u'Mark Holcomb', u'id': u'7404256'}, {u'name': u'Mark Maldonado', u'id': u'100001331564329'}, {u'name': u'Mark Margate', u'id': u'100000287187839'}, {u'name': u'M  
 ark Alejandro Perez', u'id': u'100000067248475'}]  
 >>>  
 >>> # It is trivial to run through the list and print every persons name, and facebok URL  
 >>> for person in data['data']:  
 ...   print "Name: " + person['name']  
 ...   print "Facebook URL: http://www.facebook.com/" + person['id']  
 ...    
 Name: Mark Zuckerberg  
 Facebook URL: http://www.facebook.com/4  
 Name: Mark Hoppus  
 Facebook URL: http://www.facebook.com/100000422852575  
 Name: Mark Cuban  
 Facebook URL: http://www.facebook.com/502351381    
 <snip>  

As we can see, it is very easy to programmatically obtain, access, and manipulate this data. This makes the process of gathering this data automatic, and very quick.

While in our previous example we used the search feature to find people based on name, the query ("q") parameter also searches other fields for matches. For example, if we want to find people that have either had their education at, or work for Texas Tech University, we would use the following URL:

 https://graph.facebook.com/search?q=Texas%20Tech%20University&type=user&access_token=[access_token]  

This same technique can be extended to any company. Usually, the results are very accurate, however there will be some outliers - especially if we are searching for a big company like Google or Microsoft (since these terms can appear in quite a few fields on people's profiles).

But wait, there's more!

If we thought the search feature was neat already, it actually has even more functionality that we can use to our advantage. For example, by changing the "type" parameter to "post", we can find public posts that include the word we search for. We can use this to find out what people are saying about our target company, and we might be able to use this to our advantage.

 https://graph.facebook.com/search?q=Texas%20Tech%20University&type=post&access_token=[access_token]  

In addition to this, a little-known feature of the API search is that we can find profiles using a particular email address or phone number. If we put the email address at the "q" parameter, we can see whether or not there is a Facebook profile that uses this email address or phone number, and the owner of the profile allows themselves to be searched using these attributes (enabled by default, I believe).

 https://graph.facebook.com/search?q=email@domain.com&type=user&access_token=[access_token]  

There's a ton of other features offered by the Graph API which we can use to our advantage as social engineers. I would highly recommend reading through the documentation to see other features that might suit whatever need you have. Facebook also offers the ability to make Batch Requests, which essentially allow developers to make multiple API requests in one call to Facebook. An example of when this can be handy would be checking for matches of multiple email addresses to Facebook profiles.

As a side note, you may have noticed that these queries require an access token. To always generate a new access token, it is pretty simple to create your own Facebook App, then use a user profile to generate an access token which can then be used by the app to execute these queries.

Google Custom Search API

In 2010, Google depreciated its Web Search API, which has previously been the most efficient way for developers to programmatically access Google search results. Since then, Google has encouraged developers to migrate to the new Custom Search API. This new API allows developers to setup a Custom Search Engine (CSE) which is used to search for results from a specific set of domains, and then programmatically access these results in JSON or Atom format. While only being able to search a subset of domains may seem restricting, with a little bit of effort, we can create a CSE that includes all sites - emulating the previous Web Search API.

After setting up this CSE, we can use our Google-fu to easily pull results for things like Twitter users, Linkedin Users, documents from the companies website, etc. Let's take a look at a few examples.

LinkedIn

Using the CSE we created, we can create queries which will help us quickly find profile information for LinkedIn users of a particular company. While these will be the public profiles of users, it is very common for privacy settings to be lax and allow us to see an individuals current position and company, prior work and educational experience, as well as any specific occupation/education related information they want potential employers to know about. There are times when this can amount to a large amount of information that is very useful to a social engineer.

Consider the following Google query:

 site:linkedin.com intitle:" | Linkedin" "at Texas Tech University" -intitle:profiles -inurl:groups -inurl:company -inurl:title  

This query searches LinkedIn for profiles of people who have either had past or present occupation (or in this case educational) experience at Texas Tech University. This can also be extended to fit any company we wish. Let's see what kind of results we get when performing this query on our CSE using the fantastic Python Requests module:

 >>> import requests  
 >>> import json  
 >>> import urllib  
 >>> url = 'https://www.googleapis.com/customsearch/v1?key=[access_token]&cx=[cse_id]&q=' + urllib.quote('site:linkedin.com intitle:" | Linkedin" "at Texas Tech University" -intitle:profiles -inurl:groups -inurl:company -inurl:title')  
 >>> response = requests.get(url)  
 >>> print json.dumps(response.json, indent=4)  
 {  
   "kind": "customsearch#search",   
   "url": {  
     "type": "application/json",   
     "template": "https://www.googleapis.com/customsearch/v1?q={searchTerms}&num={count?}&start={startIndex?}&lr={language?}&safe={safe?}&cx={cx?}&cref={cref?}&sort={sort?}&filter={filter?}&gl={gl?}&cr={cr?}&googlehost={googleHost?}&c2coff={disableCnTwTranslation?}&hq={hq?}&hl={hl?}&siteSearch={siteSearch?}&siteSearchFilter={siteSearchFilter?}&exactTerms={exactTerms?}&excludeTerms={excludeTerms?}&linkSite={linkSite?}&orTerms={orTerms?}&relatedSite={relatedSite?}&dateRestrict={dateRestrict?}&lowRange={lowRange?}&highRange={highRange?}&searchType={searchType}&fileType={fileType?}&rights={rights?}&imgSize={imgSize?}&imgType={imgType?}&imgColorType={imgColorType?}&imgDominantColor={imgDominantColor?}&alt=json"  
   },   
   "items": [  
     {  
       "kind": "customsearch#result",   
       "title": "Nick Acree | LinkedIn",   
       "displayLink": "www.linkedin.com",   
       "htmlTitle": "Nick Acree | <b>LinkedIn</b>",   
       "formattedUrl": "www.linkedin.com/in/nickacree",   
       "htmlFormattedUrl": "www.<b>linkedin</b>.com/in/nickacree",   
       "pagemap": {  
         "metatags": [  
           {  
             "pageimpressionid": "f137a0e0-0217-4f2e-840a-6a5f7c8ed7ec",   
             "analyticsurl": "/analytics/noauthtracker",   
             "pagekey": "nprofile-public-success"  
           }  
         ],   
         "person": [  
           {  
             "role": "Chief Accountant at Texas Tech University | Game Theorist | Business Analyst | Finance and Investment Professional",   
             "location": "Dallas/Fort Worth Area"  
           }  
         ],   
         "cse_thumbnail": [  
           {  
             "width": "80",   
             "src": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQxrZnydxNJ5eVGipiqNdK3CE49Nr-rCucJqSrzGFYM_me_bSgfkXLg",   
             "height": "80"  
           }  
         ],   
         "cse_image": [  
           {  
             "src": "http://m3.licdn.com/mpr/pub/image-00drw8kMEo15Znh5uEziqdZ_4lSfBNJ5p0debpkH4SNZBkGj00de3NSM4wpLBNV2sj8A/nick-acree.jpg"  
           }  
         ],   
         "hcard": [  
           {  
             "photo": "http://m3.licdn.com/mpr/pub/image-00drw8kMEo15Znh5uEziqdZ_4lSfBNJ5p0debpkH4SNZBkGj00de3NSM4wpLBNV2sj8A/nick-acree.jpg",   
             "fn": "Nick Acree",   
             "title": "Chief Accountant at Texas Tech University | Game Theorist | Business Analyst | Finance and Investment Professional"  
           },   
           {  
             "fn": "Texas Tech University - Rawls College of Business"  
           },   
           {  
             "fn": "Lawrence Dale Bell High School (L. D. Bell High School)"  
           },   
           {  
             "fn": "Advanced Business Analytics, Data Mining and Predictive Modeling"  
           },   
           {  
             "fn": "Banking and Finance Technologies"  
           },   
           {  
             "fn": "CFA Institute Candidates"  
           },   
           {  
             "fn": "Caribbean Community of Business Professionals (CCBP)"  
           },   
           {  
             "fn": "Caribbean Consultants & Professionals"  
           },   
           {  
             "fn": "Caribbean Jobs"  
           },   
           {  
             "fn": "Caribbean Professionals"  
           }  
         ]  
       },   
       "snippet": "Chief Accountant at Texas Tech University | Game Theorist | Business Analyst |  Finance and Investment Professional. Location: Dallas/Fort Worth Area; Industry ...",   
       "htmlSnippet": "Chief Accountant <b>at Texas Tech University</b> | Game Theorist | Business Analyst | <br> Finance and Investment Professional. Location: Dallas/Fort Worth Area; Industry <b>...</b>",   
       "link": "http://www.linkedin.com/in/nickacree",   
       "cacheId": "[cache_id]"  
     },      
     <snip>  

We can see that it's very straightforward to access and manipulate this data using the Requests modules and our CSE. More importantly, we can see just how much data is provided about each LinkedIn profile. Let's take a look at the useful data.

We can see that the "person" attribute contains the "role" and "location" of the person. With regards to parsing, it would probably just be best to consider the "location" attribute of this key, since the "role" is also listed elsewhere. The "hcard" attribute is arguably the most key in terms of simple data. It contains the name, title (which is the same as the previous role attribute), and picture URL for the user. In addition to this, it contains the full names of all affiliations or associations with which the user identifies himself/herself. This could be extremely useful in Social Engineering if we wish to create rapport with the user ("Why yes, I'm a member of the 'Caribbean Jobs', too!"), or by making phishing emails much more targeted and effective.

Also, if we ever wanted more data that may not have been included in these results (such as specific job descriptions and projects worked on), the "formattedUrl" attribute provides us with a direct link to the person's public LinkedIn proile.

Let's see a quick example of how we can extract the useful information from this data. Let's aim to get the name, position, company, location, and other affiliations. We'll pick up right where we left off in the previous code example.

 >>> for item in response.json['items']:  
      hcard = item['pagemap']['hcard']  
      affiliations = []  
      name = 'N/A'  
      photo_url = 'N/A'  
      position = 'N/A'  
      company = 'N/A'  
      location = item['pagemap']['person'][0]['location']  
      profile_url = item['formattedUrl']  
      for card in hcard:  
           # If we are in our main contact info card  
           if 'title' in card:  
                if 'fn' in card: name = card['fn']  
                if 'photo' in card: photo_url = card['photo']  
                position = card['title'].split(' at ')[0]  
                company = card['title'].split(' at ')[1]  
           affiliations.append(card['fn'])  
      print 'Name: ' + name  
      print 'Position: ' + position  
      print 'Company: ' + company  
      print 'Location: ' + location  
      print 'Profile: ' + profile_url  
      print 'Photo: ' + photo_url  
      print 'Affiliations: ' + ','.join(affiliation for affiliation in affiliations) + '\n'  
 
 Name: Nick Acree  
 Position: Chief Accountant  
 Company: Texas Tech University | Game Theorist | Business Analyst | Finance and Investment Professional  
 Location: Dallas/Fort Worth Area  
 Profile: www.linkedin.com/in/nickacree  
 Photo: http://m3.licdn.com/mpr/pub/image-00drw8kMEo15Znh5uEziqdZ_4lSfBNJ5p0debpkH4SNZBkGj00de3NSM4wpLBNV2sj8A/nick-acree.jpg  
 Affiliations: Nick Acree,Texas Tech University - Rawls College of Business,Lawrence Dale Bell High School (L. D. Bell High School),Advanced Business Analytics, Data Mining and Predictive Modeling,Banking and Finance Technologies,CFA Institute Candidates,Caribbean Community of Business Professionals (CCBP),Caribbean Consultants & Professionals,Caribbean Jobs,Caribbean Professionals  
 
 Name: Sanatan Rajagopalan  
 Position: Graduate student  
 Company: Texas Tech University  
 Location: Lubbock, Texas  
 Profile: www.linkedin.com/pub/sanatan-rajagopalan/33/239/120  
 Photo: http://m3.licdn.com/mpr/pub/image-MdKHwBMBBGplBvejy8rBq4Kx0KNqE0buKnxvqbRE0LKnZ8O5MdKv3XdB091sOUCRw0J-/sanatan-rajagopalan.jpg  
 Affiliations: Sanatan Rajagopalan,Texas Tech University,Visvesvaraya Technological University,ASIC, FPGA, SoC - Southern Cal and Southwest,Accenture - India (IDC),Atmel AVR Developers,Broadcom Corporation,Calculated Moves :: Embedded and Semiconductor,Cirrus Logic,Computer & Software Engineering Professionals  
 
 Name: Mukaddes Darwish  
 Position: associate prof.  
 Company: Texas Tech University  
 Location: Lubbock, Texas Area  
 Profile: www.linkedin.com/pub/mukaddes-darwish/9/361/589  
 Photo: N/A  
 Affiliations: Mukaddes Darwish,Texas Tech University,Construction Industry Ethical Professionals,Texas Tech Group  
 <snip>

It's should be clear by now just how easy it is to manipulate this data. This is considered very passive reconnaissance because you can notice that we never browse to LinkedIn directly to gather this information. It should be noted that LinkedIn does have its own API, but with very strict ToS, and I can't think of much information LinkedIn's API provides that is not listed in the Custom Search API results.

This same automation with the Google Custom Search API can be extended to find files on company websites with a specific extension (such as .xls, .doc, etc.), and much, much more (perhaps there will be more coverage in a future post). For now, let's see how we can find Twitter profiles using this API, and then let's see what we can do with them.

Twitter (finding profiles using Google Custom Search API)

Now let's take a look at how we can find Twitter profiles using the Google Custom Search API. Again, we will turn to our simple Google-fu skills to search for only profile pages. However, there isn't an easy way (that I know of) to only find profiles of people who work for a specific company, however we can search for the company name as another keyword, and Google will of course return profiles that are associated in some way with that company name, which proves to be fairly successful. Here's the query that we will use:

 site:twitter.com intitle:"on Twitter" "Texas Tech University"  

 Let's see what results we get running this query against our CSE:

 >>> import requests  
 >>> import urllib  
 >>> import json  
 >>> url = 'https://www.googleapis.com/customsearch/v1?key=api_key&cx=cse_id&q=' + urllib.quote('site:twitter.com intitle:"on Twitter" "Texas Tech University"')  
 >>> response = requests.get(url)  
 >>> print json.dumps(response.json, indent=4)  
 {  
   <snip>  
   "items": [  
     {  
       "kind": "customsearch#result",   
       "title": "Texas Tech (TexasTech) on Twitter",   
       "displayLink": "twitter.com",   
       "htmlTitle": "Texas Tech (TexasTech) <b>on Twitter</b>",   
       "formattedUrl": "https://twitter.com/TexasTech",   
       "htmlFormattedUrl": "https://twitter.com/TexasTech",   
       "pagemap": {  
         "metatags": [  
           {  
             "swift-page-name": "profile",   
             "msapplication-tilecolor": "#00aced",   
             "msapplication-tileimage": "//si0.twimg.com/favicons/win8-tile-144.png"  
           }  
         ],   
         "cse_thumbnail": [  
           {  
             "width": "70",   
             "src": "https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcQ9KpcntYMuLP6DvUzHmTk42m6F8K_bEGGF3cUwTPz1EG4qqxZZKwDr",   
             "height": "70"  
           }  
         ],   
         "cse_image": [  
           {  
             "src": "https://twimg0-a.akamaihd.net/profile_images/1376063137/twitter-profile-pic_bigger.jpg"  
           }  
         ]  
       },   
       "snippet": "@TexasTech. Raider Power! Official Twitter account of Texas Tech University.  News, events and updates. Tweeting M-F. Join the #Raiderland conversation!",   
       "htmlSnippet": "@TexasTech. Raider Power! Official Twitter account of <b>Texas Tech University</b>. <br> News, events and updates. Tweeting M-F. Join the #Raiderland conversation!",   
       "link": "https://twitter.com/TexasTech",   
       "cacheId": "sEefr6w340UJ"  
     },   
     <snip>  
   ],   
 }  

As you can see, we are able to easily enumerate profiles related to Texas Tech University. Most importantly, this search provides us with the profile link (and also the Twitter handle). We can extract this information in the same way we extracted the LinkedIn information above. Now that we have acquired the profile links and other information, what else can we obtain about the profiles using Twitter's own API?

Twitter API

Twitter has recently made changes in its API that caused problems for quite a few third party applications. However, we can still use this API to our advantage to find quite a bit of information about the profiles we enumerated using the Custom Search API.

As a quick note, Twitter recently "upgraded" their API to version 1.1. This version of the API no longer allows anonymous queries, so we will need to create an application to use with OAuth (much like we did with Facebook). In addition to this, new query limits have been placed on particular API calls.

Our main source of information will be found in the documentation regarding API calls for user information. Let's briefly take a look at the useful API functions that will allow us to gather the information we want.

users/lookup

This function allows us to retrieve the "extended information" for up to 100 users in one call. This information includes the following (and more):
  • Twitter handle
  • Name
  • Profile display information
  • Profile Description
  • Links to profile image, profile, etc.
  • Whether or not they have Geolocation enabled on Tweets
  • Profile Description
With the ability to specify a substantial amount of users in one API call, we can quickly get the extended information for our enumerated user profiles. A typical API call would look like the following

 https://api.twitter.com/1.1/users/lookup.json?screen_name=TexasTech  

We can also use this API to get another critical piece of information: users following our enumerated profile, and who the profile is following. These API calls return the "user objects" (similar to the output of users/lookup) about each of the friends or followers. This information can be a critical asset when preparing for a social engineering engagement. Typical API calls to these functions will look like the following:

 https://api.twitter.com/1.1/followers/list.json?cursor=-1&screen_name=TexasTech&skip_status=true&include_user_entities=true  
 https://api.twitter.com/1.1/friends/list.json?cursor=-1&screen_name=TexasTech&skip_status=true&include_user_entities=true  

Google+ API

As another resource, Google+ offers an API for developers which allows us to enumerate information for potential users. As before, we can use the Google Custom Search API with the following query to find users working for a specific company:

 site:plus.google.com intext:"Works at Texas Tech University" -inurl:photos -inurl:about -inurl:posts -inurl:plusones  

After finding the profile for users, we can easily extract their user ID since it will be part of the profile URL. We can use the ID in a GET request to obtain the "people resource" for the profile using the "People:get" API function.

A sample call to this function would look like the following:

 >>> import requests  
 >>> import json  
 >>> url = 'https://www.googleapis.com/plus/v1/people/108084201426317978902?key=api_key'  
 >>> response = requests.get(url)  
 >>> print json.dumps(response.json, indent=4)  
 {  
   "kind": "plus#person",   
   "displayName": "Texas Tech University",   
   "isPlusUser": true,   
   "url": "https://plus.google.com/108084201426317978902",   
   "tagline": "Texas Tech has more than 31,000 students pursuing 150 degree programs through 11 colleges.",   
   "image": {  
     "url": "https://lh6.googleusercontent.com/-BX9Sl2nLIgI/AAAAAAAAAAI/AAAAAAAAABE/2Wrif0f7x2k/photo.jpg?sz=50"  
   },   
   "cover": {  
     "coverInfo": {  
       "leftImageOffset": 0,   
       "topImageOffset": -158  
     },   
     "layout": "banner",   
     "coverPhoto": {  
       "url": "https://lh6.googleusercontent.com/-rPtjUfyzKaY/UFoHms715YI/AAAAAAAAAVg/szHZxhj4AF8/w940-h348/fall-leaves-1280x1024.jpg",   
       "width": 940,   
       "height": 348  
     }  
   },   
   "etag": "\"sJt1VilOvxUfDlfPWeqwjvyqpgI/5pxsUNTtt3JLvNLuIaCsR7pXmdI\"",   
   "plusOneCount": 571,   
   "urls": [  
     {  
       "type": "other",   
       "value": "http://www.ttu.edu"  
     }  
   ],   
   "verified": true,   
   "circledByCount": 460,   
   "id": "108084201426317978902",   
   "objectType": "page"  
 }  

Granted, this is the result from the main Texas Tech page. If we were looking for a standard person's profile, we could also obtain education and work history, more description information, and potentially emails.

Unfortunately, Google does not offer an official API call to retrieve the circles information for a particular profile. However, with a little bit of reverse engineering, it is fairly simple to create our own that works just fine. I may leave this for another post, since it is a bit of an involved process.

With this summary of some basic APIs concluded, let's briefly discuss some other automated tools and techniques for information enumeration.

Other Automated Resources

There are many other tools that can help us in our OSINT gathering process. Let's discuss a couple of them now:

Jigsaw.rb - The tool jigsaw.rb is included by default in Backtrack. It is a ruby script which scrapes the contact website Jigsaw for contact details and generates email addresses on the fly. It's a very handy script, and I am planning on posting a quick howto guide for it in the upcoming couple of days (I'll update this post when it's published).

Maltego - One of the most useful and widely used in the industry is Maltego, the free community version of which is included by default in Backtrack. This tool provides automatic OSINT gathering techniques using "transforms". The data is then presented and manipulated using an intuitive graphical interface of a force-directed graph.

Spokeo - With the tagline "Not your grandma's phone book", Spokeo is a search engine for social information. By just entering a name, email address, username, phone number, etc., one can find information across a variety of social networking platforms and other sources.


Username Enumeration

Once we have a username (such as a Twitter username), how would we go about finding other sites this username is registered to? This kind of information is very useful in determining other interests or profiles for a given target. There are quite a few sites that do this for us, but here are my two favorites:

namechk.com - Quick and easy, namechk provides an easy interface that searches over 150 popular sites for occurrences of the given username.

checkusernames.com - Very similar to namechk, checkusernames.com provides an easy interface that checks a substantial amount of sites (160) to see if a given username is registered.

But, checking usernames manually is no fun. With a little reverse engineering, I've created a simple script which automatically uses checkusernames interface for occurrences of a username. Here it is:

 services = ['YouTube', 'Hypemachine', 'Yahoo', 'Linkagogo', 'Coolspotters', 'Wikipedia', 'Twitter', 'gdgt', 'BlogMarks', 'LinkedIn', 'Ebay', 'Tumblr', 'Pinterest',   
                'yotify', 'Blogger', 'Flickr', 'FortyThreeMarks,Moof', 'HuffingtonPost', 'Wordpress', 'DailyMotion', 'LiveJournal', 'vimeo', 'DeviantArt', 'reddit',   
                'StumbleUpon', 'Answers', 'Sourceforge', 'Wikia', 'ArmChairGM', 'Photobucket', 'MySpace', 'Etsy,SlideShare', 'Fiverr', 'scribd', 'Squidoo', 'ImageShack',   
                'ThemeForest', 'soundcloud', 'Tagged', 'Hulu', 'Typepad', 'Hubpages', 'weebly', 'Zimbio', 'github', 'TMZ', 'WikiHow', 'Delicious', 'zillow', 'Jimdo', 'goodreads',   
                'Segnalo', 'Netlog', 'Issuu', 'ForumNokia', 'UStream', 'Gamespot', 'MetaCafe', 'askfm', 'hi5', 'JustinTV', 'Blekko', 'Skyrock', 'Cracked', 'foursquare', 'LastFM',   
                'posterous', 'steam', 'Opera', 'Dreamstime', 'Fixya', 'UltimateGuitar', 'docstoc', 'FanPop', 'Break', 'tinyurl', 'Kongregate', 'Disqus', 'Armorgames', 'Behance',   
                'ChaCha', 'CafeMom', 'Liveleak', 'Topix', 'lonelyplanet', 'Stardoll', 'Instructables', 'Polyvore', 'Proboards', 'Weheartit', 'Diigo', 'Gawker', 'FriendFeed',   
                'Videobash', 'Technorati', 'Gravatar', 'Dribbble', 'formspringme', 'myfitnesspal', '500px', 'Newgrounds', 'GrindTV', 'smugmug', 'ibibo', 'ReverbNation', 'Netvibes',   
                'Slashdot', 'Fool', 'Plurk', 'zedge', 'Discogs', 'YardBarker', 'Ebaumsworld', 'sparkpeople', 'Sharethis', 'Xmarks', 'Crunchbase', 'FunnyOrDie,Suite101', 'OVGuide',   
                'Veoh', 'Yuku', 'Experienceproject', 'Fotolog', 'Hotklix', 'Epinions', 'Hyves', 'Sodahead', 'Stylebistro', 'fark', 'AboutMe', 'Metacritic', 'Toluna', 'Mobypicture',   
                'Gather', 'Datpiff', 'mouthshut', 'blogtalkradio', 'Dzone', 'APSense', 'Bigstockphoto', 'n4g', 'Newsvine', 'ColourLovers', 'Icanhazcheezburger', 'Xanga',   
                'InsaneJournal', 'redbubble', 'Kaboodle', 'Folkd', 'Bebo', 'Getsatisfaction', 'WebShots', 'threadless', 'Active', 'GetGlue', 'Shockwave', 'Pbase']  
 for service in services:  
      try:  
           print service + '\t',  
           if 'notavailable' not in requests.get('http://checkusernames.com/usercheckv2.php?target=' + service + '&username=' + username, headers={'X-Requested-With': 'XMLHttpRequest'}).text:  
                print 'Available'  
           else:  
                print ''  
      except Exception as e:  
           print e  

Summary - What Now?

It is important to note that there are countless other (more manual) resources that can provide information for personnel, and we haven't even started covering APIs for finding system and network entity information. However, just as a quick recap, let's review the information we gathered using the resources above:

  • Facebook profiles for company employees (also checking for email address associations)
  • Linkedin Profiles for company employees (including detailed profile information such as affiliations, education, work experience, etc.)
  • Twitter Profiles for employees (including following/followers data)
  • Google+ Profiles for employees (including detailed profile information)

We can now cross reference this data, and come up with a very detailed profile for a substantial amount of users.

I hope this post was enlightening, and as always, leave comments below if you have any questions or comments!

- Jordan

18 comments:

  1. I often use archives.com, criminalsearches.com, and freecourtdockets.com alongside spokeo, pipl, jigsaw, linkedin, facebook, twitter, and google+. It used to be that pipl was good at finding facebook and linkedin profiles, but seemingly more often lately archives.com produces better results.

    Almost all intelligence analyst tools are not up to my standards. They will be soon. I want something that combines Palantir with esearchy that gets into all of these automated in order to create data sources. Maltego looks great, as always, but the price needs to come down, or it needs to be as good as Palantir.

    ReplyDelete
  2. Whoah -- http://www.canariproject.com

    ReplyDelete
    Replies
    1. Very cool! Thanks for the heads up on this - I'll be playing around with it. I thought about making something similar, but I think this will work great.

      Thanks again!

      Delete
  3. This comment has been removed by a blog administrator.

    ReplyDelete
  4. Handful reading. I'm not able to run the "CSE using the fantastic Python Requests module" examples:

    for item in response.json()['items']:
    TypeError: 'dict' object is not callable

    Any hint ?

    ReplyDelete
    Replies
    1. Great catch! I have fixed it - should be "response.json", not "response.json()"

      Thanks!

      Delete
  5. Awesome post!! Unfortunately I'm getting a 400 error. Did you enable anything else on your CSE?

    ReplyDelete
    Replies
    1. I have tried but its working fine.. No 400 error. There must be some syntax you ignoring or missing. Try same. Hope will work ok.

      Investing in Property

      Delete
    2. Thanks! Did you have to specify anything when creating the CSE?

      Delete
  6. Error msg....
    >>> response = requests.get(url)
    >>> print json.dumps(response.json, indent=4)
    Traceback (most recent call last):
    File "", line 1, in
    File "/usr/lib/python2.7/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
    File "/usr/lib/python2.7/json/encoder.py", line 203, in encode
    chunks = list(chunks)
    File "/usr/lib/python2.7/json/encoder.py", line 436, in _iterencode
    o = _default(o)
    File "/usr/lib/python2.7/json/encoder.py", line 178, in default
    raise TypeError(repr(o) + " is not JSON serializable")
    TypeError: > is not JSON serializable

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. open source intelligence can be used to automated marketing and big data analysis. There few open source companies provide api management tool to integrate API with different platforms. I have good experience with WSO2 api manager and you could find more info here - open source api

    ReplyDelete
  9. You got a very good website, Glad I found it through yahoo.
    cmp diet sehat

    ReplyDelete
  10. Get daily suggestions and methods for making THOUSANDS OF DOLLARS per day ONLINE for FREE.
    GET FREE ACCESS INSTANLY

    ReplyDelete