The Making of “Fallen Leaves….”

My recent data visualisation The Fallen Leaves of Mrs May’s Magic Money Tree is the first visualisation I’ve been genuinely proud of for quite a long time, perhaps only my U-Boat visualisation in 2018 managed that last year.

The Fallen Leaves of Mrs May’s Magic Money Tree

Why Create this Data Visualisation?

I was inspired by Dan Wainwright of the BBC and his recent article “How spending cuts changed council spending, in seven charts“. Using data from the Ministry of Housing, Communities and Local Government Dan compared council spending cuts in several areas.

Dan’s article was great, but I was left wondering what else could be done with the data. I experimented with Beeswarm plots to show outliers and several scatter plots to try and pick apart the story but nothing really helped pick out details for me. The charts I was left with weren’t engaging enough to tell an important story about the squeeze on local government, I was left trying to think of new ways and chart types to engage and show the story.

How I Developed the Idea

Several weeks and Christmas went by and I’d all but forgotten until yesterday evening I was thinking about displaying data as leaves on a tree – I had no real dataset in mind but I was thinking about the tree itself, and the leaves, might be a good metaphor with the right data.

My mind then returned to the MHLCG data – could I use this? Prime Minister May’s famous Magic Money Tree might make a great metaphor. So I got to work…

My original “leaf” looked like this:

I was imagining a tree covered with small versions of this leaf, I experimented creating the leaf out of data itself, trying to join the points, perhaps based on the spend different services – all to try and create the shape of a leaf. . However that was hard – and didn’t look great (or much like a leaf).

I also experimented with leaves like this

Perhaps with each smaller leaf node representing a different service. However I quickly gave up as due to the makeup of local government in the UK different council types are responsible for different services and finding a balance was hard (not to mention technically sizing the leaves individually in Tableau).

Finally I settled on an idea (note the above iteration and experimentation took 30-45 mins). I realised a tree might be hard but a forest floor in Autumn might show the leaves falling from May’s tree, and the colour (giving it an autumnal feel) would show the size of the cuts. The overall feel would be a big view of orange / autumn presenting a great metaphor for how cuts are hitting councils

How would the forest floor be represented? I wanted all my design choices to matter – that concept in data visualisation is very important to me.

I wanted the position of the leaf to matter – so I opted to show the rough location in the UK.

.
I wanted the size of the leaf to matter – so I opted to show the total spend on services in 2017/2018.

I wanted the colour of the leaf to matter – so I showed the change from 2012/13 to 2017/18.

I wanted the type of leaf to matter – so I used it to show the type of council.

I wanted the rotation of the leaf to matter – but without creating an extra unneeded variable I couldn’t make it work, so I made it random.

Once that was decided the build was relatively easy.

What was difficult?

Setting the X, Y locations based on locations in the UK was complicated but not difficult. I opted to use Processing to generate a spatial treemap (I used the methodology Rob Radburn talked about with me here: https://www.youtube.com/watch?v=w8WpyenYggo). I then used the centroid of each square to locate my leaf.

However the resulting grid wasn’t appealing to the eye so in Tableau I added a random() number to each lat / lon – scaling so that the result looked good but minimised overlap.

The aesthetics was also hard to get right, without stem the leaves looked flat and boring.

To add the stem I duplicated the dataset, and then used the shape mark to add “Table Name” to the shape – the first would be encoded to the Shape of the leaf, the second (at the same co-ordinates) would be encoded to the stem (which I cut from the original pictures using photoshop). If the latter was on top and coloured correctly it would work perfectly – and without the limitation of dual axis in Tableau which makes the stem appear on top of all the leaves.

Finally I haven’t talked about achieving the rotation – to do this I needed to brute for it. I used SnagIt to batch rotate the leaves and stems in 6 arbitary directions.

The shapes could then be encoded to a randomly generated number from 1 – 6 in my original data [Number], using that I could then create a field of Class (for leave Type), TableName (for stem / no stem) and {Number] to generate an encoding for colour.

After that I then simply needed to add this field to Shape and assign the shapes from the folder above in order.

I used a discrete bin of colours so I could colour the Stem separately, using the Table Name again to pick out the Stem.

Feedback

Once released I realised a few things, namely thanks to feedback from one user that the autumnal theme wasn’t suited for colour blind users – I was tempted to ignore given the theme but in the end I used parameter to add a blue colour to the theme.

In Conclusion

This was a fun build, completed, thanks to Tableau, in around 3 hours. I love the way the metaphor works in the visualisation – I think if you’re representing Council cuts as something abstract like we are here you need a strong reason, you can’t just pick any object. I think the magic money tree analogy works well for that reason, as does the title.

tldr; I had fun doing this. Which is the point.

Advertisements

Sankey Diagrams: The New Pie Chart?

Producing a Sankey chart is almost a rite of passage within the Tableau community, an accomplishment that means perhaps you’ve finally mastered this tool and are finally on a par with the experts.

But are the Sankey charts we see in the community really worthy of being put on that pedestal? or are we biased towards this particular chart type because it’s the first complex chart many of us learn to create in Tableau?

In this post I want to review our love affair with the Sankey, critique some of the use cases of Sankey charts and suggest alternatives.

Sankey Charts – An Introduction

‚ÄúSankey diagrams are a specific type of flow diagram, in which the width of the arrows is shown proportionally to the flow quantity. They are typically used to visualize energy or material transfers between processes.‚ÄĚ

(source:¬†Wikipedia, article ‚ÄėSankey diagram‚Äô)

Sankey diagrams are named after Captain Matthew Sankey, who used the diagram below in 1898 to show the energy efficiency of a steam engine.

JIE_Sankey_V5_Fig1

Perhaps the most famous Sankey-type diagram is this one by Charles Minard – which should need no introduction.

Minard

Sankeys in the Tableau Community can be credited back to Jeff Shaffer (http://www.dataplusscience.com/RecreationinTableau2.html), while his original post isn’t a true Sankey diagram – to all intents and purposes it’s a curved slope chart – it led to further posts where he built on his method and others soon followed suit – namely Oliver Catherine, myself and others (too many to mention).

A Sensible Place to Start

If I’m going to start anywhere in my critique of Sankey Charts in the Tableau Community then I should start here, March 2015:

https://www.theinformationlab.co.uk/2015/03/04/sankey-charts-in-tableau/ 

My blog post on how to use data densification was illustrated with the example below (click to see the interactive version):

Sankey

What does this example show? Is it a useful Sankey diagram?

The diagram as it stands is clearly showing several things at once, and, being an example for a “how to” blog I didn’t truly think of the question it was answering.

What understanding of the data can a viewer tease out of this visualisation?

  • Technology is the largest category
  • South is the smallest region – the others are similar sizes.
  • The split from Category to region is roughly in proportion to the size of the categories

Have a look yourself – can you pick anything more out? Give yourself two minutes?

Hardly ground breaking stuff is it? The majority of the insight comes from the bars at the side of the visualisation – not the actual “flows” (although this is not showing any kind of flow – again is this misleading?).

Consider the charts below:

Bars

How about now? Spot that Technology in the East is under performing vs Furniture compared to the other regions?

What about if we swap the dimensions to aid the comparison across regions

Bars 2

We can now see the South is struggling, and the West is particularly poor in Office Supplies.

Both these bar charts show additional insights we couldn’t get from the Sankey. In fact the only benefit the Sankey provides is in the two stacked bars at it’s sides.

I could have picked a better example for my original blog post, but I didn’t. So hands up – my example was a poor one to illustrate a piece on Sankey Charts. My plea to anyone writing “How To” tutorials – please please include a “Why To” / “What To” as well, showing good examples helps educate people as to why your chart type might be useful – and why it might not be.

(further reading on Sankey usefulness along similar lines: https://www.datarevelations.com/circles-labels-colors-legends-and-sankey-diagrams-ask-these-three-questions.html)

Sankey Charts are hard to create in Tableau

I had the pleasure of sitting in on a portion of Kevin Taylor‘s talk as he practiced it at the Tableau conference in New Orleans. I had sneaked in the back to use the room later and Kevin didn’t recognise me – so I had the pleasure of watching Kevin present “my” method of producing Sankey Charts without him being aware of my presence. You can see the portion of his talk here https://youtu.be/PcJjIAq6bxA?t=3074

What struck me was how simple it was to present – Kevin is done in 5 minutes. This could have been a 50 minute talk *just* on Sankeys, covering Data Densification, Nested Table Calculations, etc but the truth is you don’t need to understand any of that to build a Sankey.

So if you can follow instructions (i.e. build an Ikea cupboard) then can you can build a Sankey in Tableau? No, there’s still a lot to take in – abstracting the method to your data and getting the calculations right can be very hard even for programmers as the tweet below testifies.

So if it’s hard why do so many people want to build them? I think many people confuse being “good” at Tableau with building complex charts. This isn’t the case – in fact some of the people I consider the “best” users of Tableau have never created a Sankey in their lives.

Being good at Tableau and data visualisation is all about taking complex data sets and breaking them down into simple visualisations that aid the viewers understanding.

Taking simple datasets and making them hard to understand is not data visualisation!

Reading Sankeys takes Effort and Time

Flow.png

Kevin did do better than me in his use case – the example he built (above) shows the flow of humans between continents. We can see that Europe and the Americas are destinations for people from Asia and there are countless other stories hidden in this data which can be seen from the Sankey.

However reading a Sankey like this takes time, it takes effort. It takes a user who understands the chart type. However, even with that understanding and time I’m still left with the feeling – “so what”? What’s the story the data visualisation is trying to tell me? There are so many – but what’s important? Also, again, patterns are hard to distinguish as in the last example…

As we’ll see later so many of the Sankeys produced take no account of the data literacy of the user, nor do they attempt to signpost the stories or explain what the user should be looking for. In essence they do more to confuse than to explain.

What if a Sankey was *really* simple?

InfoTopics have recently released their Extension “Show Me More” which makes Sankey diagrams really simple. It has some downsides in that the Sankey is hard to format and control but generally it turns producing a Sankey into a simple process (assuming you’re happy to enable extensions).

In a world where Sankeys were really simple (either through extensions or via, say, Show Me, I wonder would we see the same usage of them?

Are people conflating “Hard” and “complex” with “Good”? If we remove the “hard” piece from the equation then would we see the same reaction to Sankey charts? Or would the reaction to them be much the same as pie charts?

In my mind both a pie chart and a Sankey chart are just as easy to misuse as each other. One might argue pie charts, being so universal, don’t require the same level of data literacy and so are harder to misuse. Pie Charts are much more maligned though (use them at your peril!) but for me we potentially see more misuse of Sankeys right now without the same visceral reaction.

Okay I can already see people reaching for Twitter to express outrage that I’ve compared Sankeys to the Pie Chart….so let’s move on…

There are many ways I don’t like to see people use Sankey charts

Okay so here I risk being controversial –¬† if you disagree please feel free to call me out in the comments below. Also if I use your Sankey as an example then please don’t take it to heart – I’ve tried to use examples from people who I know can take the criticism in the way it is intended and who I know will feel comfortable to disagree with me if they feel I’m wrong. In short if I’ve used your visualisation then it’s because I respect your work. Remember too, this is only my opinion – I haven’t seen much visualisation research done on the understanding of Sankeys and so it’s hard to go beyond an opinion.

Also remember that when people do Sankey charts they may have different motivations than purely showing the data in a way that amplifies the understanding of the data for the greatest number of people. People may be simply trying to practice a new data visualisation type, or simply engage the audience in a different way. So while I may critique the output then the creator might not have been going for the “best” chart.

I’ve chosen to leave out the creators name in the examples below, I don’t want to make this about individuals as I think the issues I’m highlighting could come from numerous examples – I am just highlighting examples as I find them.

#1 Sankeys with just two dimensions on one axis

In the above example, the Sankey aims to show “which driver contributed most to the teams success” – but the width of the curves for each year are so similar then it makes it difficult to discern the story.

UCAS 2

In the example above we’re looking at the choices made by students – the Sankey diagrams are really only there to show the male / female breakdown as well as the split between subjects. The breakdown isn’t too bad for subjects with a high percentage overall but with smaller subjects the size of the curves is impossible to ascertain for male / female and so the analysis is difficult. The creators choice of the Sankey makes the data hard to analyse – a simple stacked bar chart showing the breakdown per subject might be better – or to combine both measures into a treemap might work:

Alternative

Would three treemaps looks as good? Perhaps not – but they remain better at telling the overall story.

#2 Sankeys that can be replaced by colour legends

If your Sankey chart has dimensions that have a one to one correspondence then what value are the curves providing, beyond bling? In both of the above charts a single list could be used coloured by the appropriate value. It would be vastly simpler to understand. The first example could simple be two lists of teams for example.

#3 Sankey Charts that add no additional understanding

Hold onto your hat I think this one will be controversial…

Sorry but I really struggle with these examples (and there are so many similar examples I could have picked). What is the Sankey telling me? It’s helping show the values breakdown into other dimensions….that’s about it, it’s not giving me any additional information beyond that I can find elsewhere.

So in this case maybe my problem is the fact I’m even considering these in the Sankey category – maybe I need to realise that they’re simply visual references / cues that don’t aim to increase understanding of the data – only of the flow of the data (bottom to top and top to bottom respectively).

For me they take up a lot of “ink”

‚ÄúClutter and confusion are not attributes of data‚Ää‚ÄĒ‚Ääthey are shortcomings of design.‚ÄĚ

– Edward Tufte

Not everyone will consider Tufte to be correct but consider the subjects, perhaps a visualisation on Jimi Hendrix can get away with being light-hearted and fun – but is migration a subject that needs extra elements added?

We also have to consider “is the juice worth the squeeze” (a favourite Joe Mako quote of mine). Considering the effort to produce these curves in Tableau is it worth it?

Sometimes it is! I love this example:

My reaction to each of these design choices is very subjective – the authors highlighted above haven’t necessarily made the wrong decision (after all it’s just my opinion) but the Sankey loses its effectiveness as a great design choice when it’s used so ubiquitously in the community, so I would encourage people to think carefully before using them, especially for serious subjects.

#4 The confusing Sankey

Again this category could have included any number of visualisations

The above visualisation has lots going on, I wouldn’t call it an engaging visualisation for that reason. It looks complicated. I need to work hard to find any stories in the data – are there any stories that are worth the effort? In an era when 95% of views of a visualisation will come on social media and the user won’t interact with the view then is this the right choice given many may be put off exploring further by the confusing lines?

Recruiting Pipeline

This second example is still confusing, there’s a lot going on but the data is more interesting and it’s certainly easier to pick out some interesting facets. The stories the author highlights in the accompanying blog piece are below:

Florida, Texas, and California combined produced 44% of the Rivals 100 from 2010 and 2011.  Alabama landed 15 of the recruits, but 13 of them were undrafted.  Out of the 200 total recruits, 142 of them were undrafted as well.  Even with these recruits being the best of the best from high school, only 29% of them made it to the NFL.  Check out the visualization and see if you can find other insights.

The takeaway for me is that the main stories – two out of the three anyway – were from the stacked bar charts at the side of the visualisation. I’d love to explore more but the interaction is difficult and doesn’t allow me to see the full flow from end to end (e.g. how many originating in Florida went to Buffalo – perhaps Set actions might allow us to build better interactions to allow stories to be picked out more clearly? I’d love to be able to click on the left and see the full path for example, or see the results filter accordingly (and appropriate percentages show up as labels).

Are there any Sankeys you like?

For me a good Sankey should have a clear story, minimal confusion and a purpose that extends beyond chart bling, The example below is well designed and works well to that purpose.

Conclusions

What should you take away from this post. Firstly perhaps it’s obvious I have a dislike of Sankeys that borders on the pathological, so I’m clearly biased – it takes a lot for me to like a Sankey chart. Perhaps that comes from years of being thanked for my tutorial as I’m tagged in ugly Sankeys on Twitter.

Sankeys are not a bad choice of chart, there’s no such thing as a bad chart. The choice of a chart should come down to several things:

Story In an explanatory visualisation does it successfully convey the story you want to tell? If the visualisation is exploratory then spend a few minutes looking at the visualisation – is the “juice worth the squeeze” for the viewer?

Medium How will the visualisation be consumed? If it’s likely to be as a static image on Twitter does the visualisation still work?

Data Literacy Does your audience have the required knowledge to interpret the chart?

Alternatives Are there simpler chart types that would tell your story better?

Cool factor If you’re simply going for a “cool!” reaction then does the subject warrant it? Is your visualisation making it clear that it’s a “cool” viz or does it still expect to provide some serious takeaways? Are these two aims conflicting?

The most serious takeaway though is that you shouldn’t stop doing Sankey Charts just because a Tableau Zen Master has written a piece criticising them. The aim of this blog post is to try and make people stop and think about their visualisation choice with regards Sankey Charts, they need to justify it to themselves, not me. If you’re having fun and challenging yourself creating them then great, why the hell shouldn’t you.

Also there will be plenty of experts, far more qualified than me, who disagree – I’d love to hear your comments below and on Twitter (@chrisluv).

On Conference Etiquette and Poor Talks

We’re starting Conference season in my small corner of the¬†data world, with Tableau and Alteryx conference happening simultaneously in London and Vegas respectively. Sadly I’m missing out on my first Alteryx Inspire in a number of years – I hope my friends in Vegas have an amazing time.

As these conferences draw near we’re always treated to an array of advice from seasoned attendees around how to get the most of your experience and so I wanted to add my opinion to this growing pile of tips and tricks. In doing so I want to challenge what seems to be accepted wisdom in conferences I attend among the many bloggers and tweets I follow, the advice goes something like this:

“If you’re not enjoying a talk then walk out and find something else – your time at conference is valuable”

Personally I think this is the worst advice you could be given. Not only is it rude,¬†it also makes a bad situation worse. So let’s show you how to rescue those poor talks and turn them into a positive experience.

1. Choose your talks wisely

Take time to use the conference apps and schedules well in advance of the conference, take the time to research the speakers and topics. If you wish to learn something in particular, or already have some knowledge on the subject, then seek out opinions from peers or the speakers themselves, if you can reach them, on whether your attendance is worthwhile.

Sometimes it’s worth attending a talk not for the content itself but in order to connect with the person afterwards, particularly if they share a common interest or specialism or the same industry.

Whatever the reasons for attending the talk make sure you are clear on them before you walk through the door to attend. Ask yourself (if there are multiple sessions you wish to see) if there are ways to get the same outcome without attending, e.g. arranging to meet the speaker for a 1-to-1 (most speakers are only too flattered to be asked for a coffee to chat through their subject in detail) or watching again¬†online. Try to choose the talk you’d like to ask questions in if the sessions are recorded.

In summary I’d perhaps go as far to say there’s no such thing as a bad session, only poorly chosen ones. You owe yourself, and the speaker, the duty to choose your session carefully.

2.¬†Walking out won’t help

So, if you followed the above advice, you chose, quite deliberately, to come along to this talk. You know why you came and you know what you want to get out of it. You consider the speaker to have something interesting to say, otherwise you wouldn’t be here.

But now the talk isn’t going well. Perhaps the speaker has a voice that belongs on the shipping forecast more than a conference, or perhaps they’re having all sorts of technical problems – those perfect dashboards just won’t render on the conference screens – or maybe they’re nervous and can’t get their words out quite as they intended. Maybe they just didn’t have time to prepare. Maybe they’re reading out their slides to the audience! Whatever the reason walking out is likely to only make a bad situation¬†worse.

Why? Firstly you now have to run across a large conference venue and, if you’re lucky, join your well researched second choice talk rather late. More probably you didn’t have a second choice and so you just run into the nearest room, or your second choice is full and you can’t get in. You might even be forced to just grab a coffee and play pinball. Whatever happens you won’t have the clear outcomes you wanted from your primary choice – and¬†so you’re likely to not find it as valuable (not least because you missed some).

More importantly though what happened to all those reasons for attending the first talk? Did they go away? Of course not, so you’re giving up on a massive opportunity to rescue your original mission.

auditorium, chairs, comfortable

3. Just be Polite

As a speaker I have to say there’s nothing more off-putting than seeing people leave. At conference, in a large venue then really it¬†is to be expected, but many speakers at our data conferences aren’t professional speakers and they’re in relatively small rooms. They’ve given up there time to prepare a talk (which take a lot of effort – more than 99% of attendees have done). The least effort you could put in, having decided to attend, is make a small commitment of all 40-50 minutes of your time.

So make sure you’ve been to the bathroom, make sure you listen and engage with the speaker, try and avoid WhatsApp conversations moaning about the speaker to your friends in other sessions, avoid Facebook for an hour, because you can use the opportunity to potentially turn what could be a wasted 40-50 minutes back into a great learning opportunity.

If you do think you’re prone to voting with your feet then please sit by the door and try to avoid a minimal fuss as you leave. Also remember doors can slam in conference halls – so close doors behind you.

4. Rescuing the Situation

Yes, poor talks happen, as we’ve said, for a variety of reasons, but assuming you’ve decided to stick around then you can rescue the situation and still achieve your original objective for attending the talk.

How do you rescue the situation?

  • Be patient – speakers, particularly customer speakers, are often nervous and so they’ll take a while to loosen up.
  • Think of questions – focus on what the speaker isn’t saying, that’s often the more interesting stuff. How does that tie in with what you wanted to get out the session? Write down a set of questions as the speaker goes through?
  • Ask questions at the end – new speakers will more often than not under-run, leaving plenty of time for questions. This is your chance to really get what you need to know. Tie questions back to what the speaker was saying to show you were listening and ask them to expand on areas of interest to you. Often getting a speaker ad-libbing about something they feel passionate about is where you’ll really start to learn something.
  • Approach the speaker at the end of the talk – as the room empties make sure to say Hi. You could even offer to buy them a coffee if you still haven’t got what you wanted from the talk. Remember you chose this person as an expert in a field you were interested in, one bad presentation¬†doesn’t mean they don’t have something interesting to say.

Prepare well and remember your objectives

In conclusion you owe yourself the duty of preparing well for the talks you want to attend, that preparation will help you focus on what you want to achieve and help you through any sessions that don’t live up to your expectations.

Walking out and leaving poor survey feedback isn’t your only choice, in fact it is likely to be the worst choice you can make. Make the most of the experts the conferences lay on for you and¬†enjoy yourself.

 

 

Data Visualisation: Lonely Hearts Club

2017-03-03_14-00-37
My data visualisation life outside work is missing something. I‚Äôm lonely. The hours I spend hunched over the PC visualising data remain unfulfilling. When I‚Äôm not ‚Äúvizzing‚ÄĚ the rest of my time is spent on social media networks with other single vizzers. We all pretend we‚Äôre happy being single, but deep down I know many of us aren‚Äôt. I think it’s important to talk about the loneliness.

You see I’ve spent years now without an audience. At first it was fun, I had the freedom to do what I wanted when I wanted; I didn’t have to worry about pleasing the other half. I spent so many weekends on the equivalent of a boys night out, visualising random datasets, where I splurged out having fun and not really caring about the consequences. Usually I was in the company of lots of other singles and we had a blast. I even had a few meaningless relationships out of those nights, I hope they prepared me for what it’s like to be in a real relationship but I worry they taught me bad habits. After all those nights were all about impressing my mates, not my prospective partner, and so while the results were impressive I’m not sure either of us got any long term value out of the fling.

pexels-photo-247839

Having an audience, so we‚Äôre told, is the norm. Articles everywhere tell us how to keep our audience when we‚Äôve found her, but there‚Äôs never any clue in them about how to find one in the first place. ‚ÄúKnow your audience‚ÄĚ everyone says, and every time I hear that a little piece of me dies because I know so many people who don‚Äôt have one.

A life in Data Visualisation without an audience is hard. I try my best but I end up vomiting data points and facts onto a page in attempt to make something meaningful. I make them engaging, I add pictures and I try to piece a story together but if I’m honest it’s nothing more than a bit of data porn. Something I know my fellow singles will find entertaining, briefly, but that will be quickly binned as they click on looking for something a little bit more hardcore.

Recently I‚Äôve been attending a few singles nights with the aim of finding a long term partner / audience. Last weekend I was at #OpenDataCamp where I made an appeal for an audience, a user, someone, anyone who I could work with to help solve real issues with visualisation. Yes I know they‚Äôd give me problems and challenges but I want to do something meaningful; I think I‚Äôm ready for some commitment. Maybe I came across desperate because no one was interested. It was fun, I met plenty of people looking for the same thing as me from a slightly different angle, they had the data but also no audience…some even suggested if I found someone then they could join us in a threesome. I liked the sound of that but perhaps having three in the relationship will only complicate things more‚Ķ.

pexels-photo-165263

Ultimately I guess everyone wants to settle down like me but many of my older friends have settled into the single life as a permanent bachelor. Some of them I never hear from, it’s really sad to see people disappear because they couldn’t find an audience, I wonder where they go? Maybe they found one and never told me…. Others are happy telling others how to have productive relationships without having one themselves.¬†Still others have taken themselves off the market, thrown themselves into work where they can have real relationships, again we don‚Äôt see them much anymore. Yes, some of the old timers still join us on boys nights out, but if I‚Äôm honest it‚Äôs a bit sad seeing them on nights out with the young crowd. I don‚Äôt want to be one of them, I want to have a meaningful relationship with someone I can commit to. Wven if it’s just short term I want it to be meaningful. I hope there‚Äôs still time, I think I have a lot to offer if I meet the right partner.

If you know anyone who can be my audience let me know, I’d love to meet one and try and work together to create something special.

 

Using Inspect / Javascript to scrape data from visualisations online

My last post talked about making over this visualisation from The Guardian:

2016-11-13_12-55-29

What I haven’t explained is how I found the data. That is what I intend to outline in this post. Learning these skills is very useful if you need to find data for re-visualising data visualisations / tables found online.

The first step with trying to download data for any visualisation online is by looking checking how it is made, it may simply be a graphic (in which case it may be hard unless it is a chart you can unplot using WebPlotDigitiser) but in the case of interactive visualisations they are typically made with javascript unless they are using a bespoke product such as Tableau.

Assuming it is interactive then you can start to explore by using right-click on the image and choose Inspect (in Chrome, other browsers have similar developer tools).

2016-11-13_19-26-35

I was treated with this view:

2016-11-13_19-28-09.png

I don’t know much about coding but this looking like the view is being built by a series of paths. I wonder how it might be doing this? We can find out by digging deeper, let’s visit the Sources tab:

2016-11-13_19-31-30

Our job on this tab is to look for anything unusual outside the typical javascript libraries (you learn these by being curious and looking at lots of sites). The first file gay-rights-united-states looks suspect but as can be seen from the image above it is empty.

Scrolling down, see below, we find¬†there is an embedded file / folder (flat.html) and in that is something new all.js and main.js….

2016-11-13_19-34-05

Investigating all.js reveals nothing much but main.js shows us something very interesting on line 8. JACKPOT! A google sheet containing the full dataset.

2016-11-13_19-38-25

And we can start vizzing! (btw I transposed this for my visualisation to get a column per right).

Advanced Interrogation using Javascript

Now part way through my visualisation I realised I needed to show the text items the Guardian had on their site but these weren’t included in the dataset.

2016-11-13_19-41-27

I decided to check the javascript code to see where this was created to see if I could decipher it, looking through main.js I found this snippet:

function populateHoverBox (type, position){

 var overviewObj = {
 'state' : stateData[position].state
 }
.....
if(stateData[position]['marriage'] != ''){
 overviewObj.marriage = 'key-marriage'
 overviewObj.marriagetext = 'Allows same-sex marriage.'
 } else if(stateData[position]['union'] != '' && stateData[position]['marriageban'] != ''){
 overviewObj.marriage = 'key-marriage-ban'
 overviewObj.marriagetext = 'Allows civil unions; does not allow same-sex marriage.'
 } else if(stateData[position]['union'] != '' ){
 overviewObj.marriage = 'key-union'
 overviewObj.marriagetext = 'Allows civil unions.'
 } else if(stateData[position]['dpartnership'] != '' && stateData[position]['marriageban'] != ''){
 overviewObj.marriage = 'key-marriage-ban'
 overviewObj.marriagetext = 'Allows domestic partnerships; does not allow same-sex marriage.'
 } else if(stateData[position]['dpartnership'] != ''){
 overviewObj.marriage = 'key-union'
 overviewObj.marriagetext = 'Allows domestic partnerships.'
 } else if (stateData[position]['marriageban'] != ''){
 overviewObj.marriage = 'key-ban'
 overviewObj.marriagetext = 'Same-sex marriage is illegal or banned.'
 } else {
 overviewObj.marriagetext = 'No action taken.'
 overviewObj.marriage = 'key-none'
 }

…and it continued for another 100 odd lines of code. This wasn’t going to be as easy as I hoped. Any other options? Well what if I could extract the contents of the overviewObj. Could I write this out to a file?

I tried a “Watch” using the develop tools but the variable went out of scope each time I hovered, so that wouldn’t be useful. I’d therefore try saving the flat.html locally and try outputting a file with the contents to my local drive….

As I say I’m no coder (but perhaps more comfortable than some) and so I googled (and googled) and eventually stumbled on this post

http://stackoverflow.com/questions/16376161/javascript-set-file-in-download

I therefore added the function to my local main.js and added a line in the populateHoverBox function….okay so maybe I can code a tiny bit….

var str = JSON.stringify(overviewObj);
 
download(str, stateData[position].state + '.txt', 'text/plain');

In theory this should serialise the overviewObj to a string (according to google!) and then download the resulting data to a file called <State>.txt

Now for the test…..

downloadingfiles

BOOM, BOOM and BOOM again!

Each file is a JSON file

2016-11-13_20-07-21

Now to copy the files out from the downloads folder, remove any duplicates, and combine using Alteryx.

2016-11-13_20-04-59

As you can see using the wildcard input of the resulting json file and a transpose was simple.

2016-11-13_20-08-31

Finally to combine with the google sheet (called “Extract” below) and the hexmap data (Sheet 1) in Tableau…..

2016-11-13_20-09-41

Not the most straightforward data extract I’ve done but I thought it was useful blogging about so others could see that extracting data from visualisation online is possible.

You can see the resulting visualisation my previous post.

Conclusion

No one taught me this method, and I have never been taught how to code. The techniques described here are simply the result of continuous curiosity and exploration of how interactive tables and visualisations are built.

I have used similar techniques in other places to extract data visualisations, but no two methods are the same, nor can a generic tutorial be written. Simply have curiosity and patience and explore everything.

 

Combining Multiple Hexmaps using Segments

After my #Data16 talk Chad Skelton challenged me to do a simple remake of the Guardian sunburst-type visualisation that I critiqued in my Sealed with a KISS talk (which you can now watch live at this link).

The original visualisation is show below:

2016-11-13_12-55-29.png

While initially engaging, I find this view complex to read and extracting any useful information involves several round trips to the legend. The circular format makes the visualisation appealing while sacrificing simple comprehension. Could I do better though?

Chad suggested small multiple maps and I agreed this might be the simplest approach but I was not happy with the resulting maps:

2016-11-13_18-22-51

 

Alaska and Hawaii why do you ruin my maps? The Data Duo have several solutions and my favourite is the tile map.

Thankfully Zen Master Matt Chambers has made Tile Maps very easy in this post and so I followed the instructions, joining the Excel file he provided onto my data and giving a much more visually appealing and informative result. The resulting visualisation is below (click for an interactive version):

cxjnfitxgae5mkn

However I still wasn’t satisfied with this visualisation, it has several problems:

  • it separates out the variables per state, meaning the viewer till has a lot of work to do to compare each states full rights.
  • it still requires the use of the legend to fully understand
  • the hover action reveals extra info meaning the users has to drag around to reveal the story
  • the legend is squashed due to space

How to solve these issues? I spent a while pondering it and eventually I found a possible answer: I could use a single map but split each hexagon into segments (ignoring marriage as it is allowed in all states Рanother solution woudl have been to cut out a dot in the middle for the seventh segment).

To do this I’d need to split up each Hexagon into segments, therefore I took out my drawing package and created six shapes:

These six shapes have transparent backgrounds and, importantly, when combined create a single hexagon.

Now with these shapes I can use a dimension (such as Group below) on shape, and then use colour to combine each hegaxon into different segment colours on the map (using Matt’s method and data for Hex positions).

2016-11-13_18-41-23.png

Using this technique I therefore created the visualisation below (click for interactive version):

2016-11-13_15-58-05

Using this method it would be possible to combine 3, 6, 9 or 12 (or possibly more) dimensions on a single map by segmenting the hexagons. Similarly using a circle in the middle would allow 4 or 7 dimensions.

I’m not sure how applicable this type of method is to other visualisations but please let me know if you use it as I’d love to see some more examples.

MM Week 44: Scottish Index of Multiple Deprivation

This weeks Makeover Monday (week 44) focuses on the Scottish Index of Multiple Deprivation.

2016-10-30_21-54-04

Barcode charts like this can be useful for seeing small patterns in Data but the visualisation has some issues.

What works well

  • It shows all the data in a single view with no clicking / interaction
  • Density of lines shows where most areas lie e.g. Glasgow and North Lanarkshire can quickly be seen as having lots of areas among the most¬†deprived
  • It is simple and eye catching

What does work as well

  • No indication of population in each area
  • Areas tend to blur together
  • It may be overly simple for the audience

In my first attempt to solve these problems I addressed the second problem above using a jitter (using the random() function)

2016-10-30_22-05-26

However it still didn’t address the population issue and given the vast majority of points had similar population with a few outliers (see below) I wondered whether to¬†even address the¬†issue.

2016-10-30_22-08-40

Then I realised I could perhaps go back to the original and simply expand on it with a box plot (adding a sort for clarity):

2016-10-30_22-16-23.jpg

Voila, a simple makeover that improves the original and adds meaning and understanding while staying true to the aims of the original. Time for dinner.

Done and dusted…wasn’t I? If I had any sense I would be but I wanted to find out more about the population of each area. Were the more populated areas also the more deprived?

There have been multiple discussions this week on Twitter about people stepping beyond what Makeover Monday is was intended to be about. However there was story to tell here and I dwelled on it over dinner and, with the recent debates about the aims of Makeover Monday (and data visualisation generally), swirling in my head I wondered what I should do.

I wondered about the rights and wrongs of continuing with a more complex visualisation, should finish here and show how simple Makeover Monday can be? Or should I satisfy my natural curiosity and investigate¬†a chart that, while perhaps more complex, might show ways of presenting data that others hadn’t considered….

I had the data bug and I wanted to tell a story even if it meant diving a bit deeper and perhaps breaking the “rules” of Makeover Monday and spending longer on the visualisation. I caved in and went beyond a simple makeover….sorry Andy K.

Perhaps a scatter plot might work best focusing at the median deprivation of a given area (most deprived at the top by reversing the Rank axis):

2016-10-30_22-11-22

 

Meh, it’s simple but hides a lot of the detail. I added each Data Area and it got too messy as a scatter – but how about a Pareto type chart…

2016-10-30_22-23-23.jpg

So we can see from the running sum of population (ordered by the most deprived areas first) that lots of people live in deprived areas in Glasgow, but we also see the shape of the other lines is lost given so many people live in Glasgow.

So I added a secondary percent of total, not too complex….this is still within the Desktop II course for Tableau.

2016-10-30_22-26-03.jpg

Now we were getting somewhere. I can see from the shape of the line whether areas have high proportions of more or less deprived people. Time to add some annotation and explanation….as well as focus on the original 15% most deprived as in the original.

Click on the image below to go to the interactive version. This took me around 3 hours to build following some experimenting with commenting and drop lines that took me down blind (but fun) alleys before I wound back to this.

2016-10-30_21-51-53

Conclusion

Makeover Monday is good fun, I happened to have a bit more time tonight and I got the data bug. I could have produced¬†the slightly improved visualisation and stuck with it, but that’s not how storytelling goes. We see different angles and viewpoints, constraining myself to too narrow a viewpoint felt like I was ignoring an itch that just needed scratching.

I’m glad I scratched it. I’m happy with my¬†visualisation but I offer the following critique:

What works well:

  • it’s more engaging than the original, while it is more complex I hope the annotations offer enough detail to help draw the viewer in and get them exploring.
  • the purple labels show the user the legend at the same time as describing the data.
  • there is a story for the user to explore as they click, pop-up text adds extra details.
  • it adds context about population within areas.

What doesn’t work well:

  • the user is required to explore with clicks rather than simply scanning the image – a small concession given the improvement in engagement I hope I have made.
  • the visualisation take some understanding, percent of total cumulative population is a hard concept that many of the public simply won’t understand. The audience for this visualisation is therefore slightly more academic than the original. Would I say this is suitable for publishing on the original site? On balance I probably would say¬†it was. The original website is text / table heavy and clearly intended for researchers not the public and therefore the audience can be expected to be willing to take longer to understand the detail.

Comment and critique welcomed and encouraged please.