Using Inspect / Javascript to scrape data from visualisations online

My last post talked about making over this visualisation from The Guardian:

2016-11-13_12-55-29

What I haven’t explained is how I found the data. That is what I intend to outline in this post. Learning these skills is very useful if you need to find data for re-visualising data visualisations / tables found online.

The first step with trying to download data for any visualisation online is by looking checking how it is made, it may simply be a graphic (in which case it may be hard unless it is a chart you can unplot using WebPlotDigitiser) but in the case of interactive visualisations they are typically made with javascript unless they are using a bespoke product such as Tableau.

Assuming it is interactive then you can start to explore by using right-click on the image and choose Inspect (in Chrome, other browsers have similar developer tools).

2016-11-13_19-26-35

I was treated with this view:

2016-11-13_19-28-09.png

I don’t know much about coding but this looking like the view is being built by a series of paths. I wonder how it might be doing this? We can find out by digging deeper, let’s visit the Sources tab:

2016-11-13_19-31-30

Our job on this tab is to look for anything unusual outside the typical javascript libraries (you learn these by being curious and looking at lots of sites). The first file gay-rights-united-states looks suspect but as can be seen from the image above it is empty.

Scrolling down, see below, we find there is an embedded file / folder (flat.html) and in that is something new all.js and main.js….

2016-11-13_19-34-05

Investigating all.js reveals nothing much but main.js shows us something very interesting on line 8. JACKPOT! A google sheet containing the full dataset.

2016-11-13_19-38-25

And we can start vizzing! (btw I transposed this for my visualisation to get a column per right).

Advanced Interrogation using Javascript

Now part way through my visualisation I realised I needed to show the text items the Guardian had on their site but these weren’t included in the dataset.

2016-11-13_19-41-27

I decided to check the javascript code to see where this was created to see if I could decipher it, looking through main.js I found this snippet:

function populateHoverBox (type, position){

 var overviewObj = {
 'state' : stateData[position].state
 }
.....
if(stateData[position]['marriage'] != ''){
 overviewObj.marriage = 'key-marriage'
 overviewObj.marriagetext = 'Allows same-sex marriage.'
 } else if(stateData[position]['union'] != '' && stateData[position]['marriageban'] != ''){
 overviewObj.marriage = 'key-marriage-ban'
 overviewObj.marriagetext = 'Allows civil unions; does not allow same-sex marriage.'
 } else if(stateData[position]['union'] != '' ){
 overviewObj.marriage = 'key-union'
 overviewObj.marriagetext = 'Allows civil unions.'
 } else if(stateData[position]['dpartnership'] != '' && stateData[position]['marriageban'] != ''){
 overviewObj.marriage = 'key-marriage-ban'
 overviewObj.marriagetext = 'Allows domestic partnerships; does not allow same-sex marriage.'
 } else if(stateData[position]['dpartnership'] != ''){
 overviewObj.marriage = 'key-union'
 overviewObj.marriagetext = 'Allows domestic partnerships.'
 } else if (stateData[position]['marriageban'] != ''){
 overviewObj.marriage = 'key-ban'
 overviewObj.marriagetext = 'Same-sex marriage is illegal or banned.'
 } else {
 overviewObj.marriagetext = 'No action taken.'
 overviewObj.marriage = 'key-none'
 }

…and it continued for another 100 odd lines of code. This wasn’t going to be as easy as I hoped. Any other options? Well what if I could extract the contents of the overviewObj. Could I write this out to a file?

I tried a “Watch” using the develop tools but the variable went out of scope each time I hovered, so that wouldn’t be useful. I’d therefore try saving the flat.html locally and try outputting a file with the contents to my local drive….

As I say I’m no coder (but perhaps more comfortable than some) and so I googled (and googled) and eventually stumbled on this post

http://stackoverflow.com/questions/16376161/javascript-set-file-in-download

I therefore added the function to my local main.js and added a line in the populateHoverBox function….okay so maybe I can code a tiny bit….

var str = JSON.stringify(overviewObj);
 
download(str, stateData[position].state + '.txt', 'text/plain');

In theory this should serialise the overviewObj to a string (according to google!) and then download the resulting data to a file called <State>.txt

Now for the test…..

downloadingfiles

BOOM, BOOM and BOOM again!

Each file is a JSON file

2016-11-13_20-07-21

Now to copy the files out from the downloads folder, remove any duplicates, and combine using Alteryx.

2016-11-13_20-04-59

As you can see using the wildcard input of the resulting json file and a transpose was simple.

2016-11-13_20-08-31

Finally to combine with the google sheet (called “Extract” below) and the hexmap data (Sheet 1) in Tableau…..

2016-11-13_20-09-41

Not the most straightforward data extract I’ve done but I thought it was useful blogging about so others could see that extracting data from visualisation online is possible.

You can see the resulting visualisation my previous post.

Conclusion

No one taught me this method, and I have never been taught how to code. The techniques described here are simply the result of continuous curiosity and exploration of how interactive tables and visualisations are built.

I have used similar techniques in other places to extract data visualisations, but no two methods are the same, nor can a generic tutorial be written. Simply have curiosity and patience and explore everything.

 

Combining Multiple Hexmaps using Segments

After my #Data16 talk Chad Skelton challenged me to do a simple remake of the Guardian sunburst-type visualisation that I critiqued in my Sealed with a KISS talk (which you can now watch live at this link).

The original visualisation is show below:

2016-11-13_12-55-29.png

While initially engaging, I find this view complex to read and extracting any useful information involves several round trips to the legend. The circular format makes the visualisation appealing while sacrificing simple comprehension. Could I do better though?

Chad suggested small multiple maps and I agreed this might be the simplest approach but I was not happy with the resulting maps:

2016-11-13_18-22-51

 

Alaska and Hawaii why do you ruin my maps? The Data Duo have several solutions and my favourite is the tile map.

Thankfully Zen Master Matt Chambers has made Tile Maps very easy in this post and so I followed the instructions, joining the Excel file he provided onto my data and giving a much more visually appealing and informative result. The resulting visualisation is below (click for an interactive version):

cxjnfitxgae5mkn

However I still wasn’t satisfied with this visualisation, it has several problems:

  • it separates out the variables per state, meaning the viewer till has a lot of work to do to compare each states full rights.
  • it still requires the use of the legend to fully understand
  • the hover action reveals extra info meaning the users has to drag around to reveal the story
  • the legend is squashed due to space

How to solve these issues? I spent a while pondering it and eventually I found a possible answer: I could use a single map but split each hexagon into segments (ignoring marriage as it is allowed in all states – another solution woudl have been to cut out a dot in the middle for the seventh segment).

To do this I’d need to split up each Hexagon into segments, therefore I took out my drawing package and created six shapes:

These six shapes have transparent backgrounds and, importantly, when combined create a single hexagon.

Now with these shapes I can use a dimension (such as Group below) on shape, and then use colour to combine each hegaxon into different segment colours on the map (using Matt’s method and data for Hex positions).

2016-11-13_18-41-23.png

Using this technique I therefore created the visualisation below (click for interactive version):

2016-11-13_15-58-05

Using this method it would be possible to combine 3, 6, 9 or 12 (or possibly more) dimensions on a single map by segmenting the hexagons. Similarly using a circle in the middle would allow 4 or 7 dimensions.

I’m not sure how applicable this type of method is to other visualisations but please let me know if you use it as I’d love to see some more examples.

Best of Alteryx on the Web – November 2014

Another busy month in the Alteryx blogosphere and so here are some links to some of the best content you may have missed.

Tips and Tricks

3danim8’s Blog – How and Why Alteryx and Tableau allow me to innovate  – Part 1 and Part 2

The Information Lab – 7 Alteryx Tips you need to start using today

Inspiring Ingenuity – Alteryx – Optimising  modules for Speed

The Information Lab – Bite-sized Tips, Tricks and Tutorial Videos for Alteryx

Commentary

Alteryx.com – Data Blending for Dummies – Special Edition

Schiolistic Ramblings – The Business User and BI: Analytics, Visualisation and Testing

Alteryx.com – 5 Myths of Data Blending

Antivia – From raw data to interactive dashboard in minutes

Tool Guides and Macros

The Information Lab – What Time is it Alteryx – Part 1

Human Data Associates – Visualize all Dutch Cities and neighbourhoods in Tableau (nothing good happens without Alteryx)

Alteryx Gallery – X-Ray Browse Macro

Think I’ve missed anything, or you’ve got something worthy of next months roundup? Please reach out on Twitter (@ChrisLuv) or in the comments below.