Using Inspect / Javascript to scrape data from visualisations online

My last post talked about making over this visualisation from The Guardian:

2016-11-13_12-55-29

What I haven’t explained is how I found the data. That is what I intend to outline in this post. Learning these skills is very useful if you need to find data for re-visualising data visualisations / tables found online.

The first step with trying to download data for any visualisation online is by looking checking how it is made, it may simply be a graphic (in which case it may be hard unless it is a chart you can unplot using WebPlotDigitiser) but in the case of interactive visualisations they are typically made with javascript unless they are using a bespoke product such as Tableau.

Assuming it is interactive then you can start to explore by using right-click on the image and choose Inspect (in Chrome, other browsers have similar developer tools).

2016-11-13_19-26-35

I was treated with this view:

2016-11-13_19-28-09.png

I don’t know much about coding but this looking like the view is being built by a series of paths. I wonder how it might be doing this? We can find out by digging deeper, let’s visit the Sources tab:

2016-11-13_19-31-30

Our job on this tab is to look for anything unusual outside the typical javascript libraries (you learn these by being curious and looking at lots of sites). The first file gay-rights-united-states looks suspect but as can be seen from the image above it is empty.

Scrolling down, see below, we find there is an embedded file / folder (flat.html) and in that is something new all.js and main.js….

2016-11-13_19-34-05

Investigating all.js reveals nothing much but main.js shows us something very interesting on line 8. JACKPOT! A google sheet containing the full dataset.

2016-11-13_19-38-25

And we can start vizzing! (btw I transposed this for my visualisation to get a column per right).

Advanced Interrogation using Javascript

Now part way through my visualisation I realised I needed to show the text items the Guardian had on their site but these weren’t included in the dataset.

2016-11-13_19-41-27

I decided to check the javascript code to see where this was created to see if I could decipher it, looking through main.js I found this snippet:

function populateHoverBox (type, position){

 var overviewObj = {
 'state' : stateData[position].state
 }
.....
if(stateData[position]['marriage'] != ''){
 overviewObj.marriage = 'key-marriage'
 overviewObj.marriagetext = 'Allows same-sex marriage.'
 } else if(stateData[position]['union'] != '' && stateData[position]['marriageban'] != ''){
 overviewObj.marriage = 'key-marriage-ban'
 overviewObj.marriagetext = 'Allows civil unions; does not allow same-sex marriage.'
 } else if(stateData[position]['union'] != '' ){
 overviewObj.marriage = 'key-union'
 overviewObj.marriagetext = 'Allows civil unions.'
 } else if(stateData[position]['dpartnership'] != '' && stateData[position]['marriageban'] != ''){
 overviewObj.marriage = 'key-marriage-ban'
 overviewObj.marriagetext = 'Allows domestic partnerships; does not allow same-sex marriage.'
 } else if(stateData[position]['dpartnership'] != ''){
 overviewObj.marriage = 'key-union'
 overviewObj.marriagetext = 'Allows domestic partnerships.'
 } else if (stateData[position]['marriageban'] != ''){
 overviewObj.marriage = 'key-ban'
 overviewObj.marriagetext = 'Same-sex marriage is illegal or banned.'
 } else {
 overviewObj.marriagetext = 'No action taken.'
 overviewObj.marriage = 'key-none'
 }

…and it continued for another 100 odd lines of code. This wasn’t going to be as easy as I hoped. Any other options? Well what if I could extract the contents of the overviewObj. Could I write this out to a file?

I tried a “Watch” using the develop tools but the variable went out of scope each time I hovered, so that wouldn’t be useful. I’d therefore try saving the flat.html locally and try outputting a file with the contents to my local drive….

As I say I’m no coder (but perhaps more comfortable than some) and so I googled (and googled) and eventually stumbled on this post

http://stackoverflow.com/questions/16376161/javascript-set-file-in-download

I therefore added the function to my local main.js and added a line in the populateHoverBox function….okay so maybe I can code a tiny bit….

var str = JSON.stringify(overviewObj);
 
download(str, stateData[position].state + '.txt', 'text/plain');

In theory this should serialise the overviewObj to a string (according to google!) and then download the resulting data to a file called <State>.txt

Now for the test…..

downloadingfiles

BOOM, BOOM and BOOM again!

Each file is a JSON file

2016-11-13_20-07-21

Now to copy the files out from the downloads folder, remove any duplicates, and combine using Alteryx.

2016-11-13_20-04-59

As you can see using the wildcard input of the resulting json file and a transpose was simple.

2016-11-13_20-08-31

Finally to combine with the google sheet (called “Extract” below) and the hexmap data (Sheet 1) in Tableau…..

2016-11-13_20-09-41

Not the most straightforward data extract I’ve done but I thought it was useful blogging about so others could see that extracting data from visualisation online is possible.

You can see the resulting visualisation my previous post.

Conclusion

No one taught me this method, and I have never been taught how to code. The techniques described here are simply the result of continuous curiosity and exploration of how interactive tables and visualisations are built.

I have used similar techniques in other places to extract data visualisations, but no two methods are the same, nor can a generic tutorial be written. Simply have curiosity and patience and explore everything.

 

Advertisements

The Art(isan) of Data Analysis

Firstly an announcement – I’m moving jobs, from the start of January I’m very pleased to say I’ll be working at The Information Lab, one of the longest standing Tableau Partners in the UK and Tableau’s EMEA Partner of the Year they also very recently became Alteryx partners. I approached Tom, Craig and the team because they have clearly demonstrated a passion with Tableau that mirrors my own passion for Alteryx and, having got to know the ethos of the company and their values, then I’m very excited for what the future holds – for me, my new colleagues and also for Tableau and Alteryx.

All this has got me thinking about our role and how we describe what we do. For their part Alteryx coined the term Data Artisan to describe the people using their software; often those people without analyst in their name but those who find themselves needing to solve problems without the need for coding or IT departments. To be honest I never really got it, but with my new role I started considering the name again and considering my own situation with Alteryx and Tableau and it started to make sense.

For starts let’s look at what those words mean and their origin:

Data, “facts and statistics collected together for reference or analysis”, is the nominative plural of datum, originally a Latin noun meaning “that is given”.

Artisan (according to www.oxforddictionaries.com/) is a worker in a skilled trade, especially one that involves making things by hand. It has it’s origins in the mid 16th century “from French, from Italian artigiano, based on Latin artitus, past participle of artire ‘instruct in the arts’, from ars, art- ‘art'”.

Okay, so technically yes, being in a skilled trade working on facts and statistics for analysis or reference I can call myself a Data Artisan. More specifically my new role will involve instructing others in “the arts” and so this will also ring true.

File:Mendel I 053 v.jpg

An artisan from the 15th century

So, I’m a Data Artisan technically – what about practically? Well let’s consider the tools of my trade:

Data – the raw materials / elements I work with

Alteryx – the tool of choice for data munging / data reshaping / data blending

Tableau – the tool of choice for data visualisation

The Dashboard –  a representation of how the analysis looks that helps people understand the overall story

What about an Artisan’s tools of choice? Let’s consider a painter:

Paint – the raw materials / elements (s)he works with

Palette – the tool of choice for paint blending

Canvas/Brushes – the tool of choice for paint visualisation

The Painting – a representation of how the scene looked that helps people understand the overall story

…and like an artist a “Data Artisan” their skill in telling the story means the result becomes greater than the sum of it’s parts, and they can represent analysis in very different ways by skewing their visualisation towards their own view or political bias.

So looking at it this way then I’m left to think perhaps I am a Data Artisan after all…

As a final, perhaps fatal, push on the metaphor I’d like to ask…would an artist mix his paints directly on the canvas? Would an artist paint his picture on his palette? If you’re a Tableau or Alteryx user then there’s no need to compromise on the end result – make sure you’re being true to your art because Alteryx and Tableau used together are the only way to true masterpieces. [okay I got a tiny bit cheesy there but you get the idea!]

Having said all that I don’t think I’ll be calling myself a Data Artisan too often, I think Paul Banoub (The VizNinja!) said it best when he said:

“… call yourself whatever you want. Call yourself a Ninja, or a Jedi or a Yeti or a data rockstar. I don’t care. Just keep on pushing the boundaries and discovering. You should be proud of yourself for trying.” – Paul Banoub

In future my blogging efforts will be mainly on The Information Lab Blog but I will continue to add things to this blog on a less frequent basis, and will be reviewing the best of the Alteryx and Tableau community in regular posts here.

Thanks for reading.

Appendum

As a tease, here’s the kind of thing you can create in Tableau if you mix your data in Alteryx first. Check in with the Info Lab in the New Year to find out how.

Embedded image permalink