Producing a Sankey chart is almost a rite of passage within the Tableau community, an accomplishment that means perhaps you’ve finally mastered this tool and are finally on a par with the experts.
— Sam Parsons (@SParsonsDataViz) March 15, 2018
But are the Sankey charts we see in the community really worthy of being put on that pedestal? or are we biased towards this particular chart type because it’s the first complex chart many of us learn to create in Tableau?
In this post I want to review our love affair with the Sankey, critique some of the use cases of Sankey charts and suggest alternatives.
Sankey Charts – An Introduction
“Sankey diagrams are a specific type of flow diagram, in which the width of the arrows is shown proportionally to the flow quantity. They are typically used to visualize energy or material transfers between processes.”
(source: Wikipedia, article ‘Sankey diagram’)
Sankey diagrams are named after Captain Matthew Sankey, who used the diagram below in 1898 to show the energy efficiency of a steam engine.
Perhaps the most famous Sankey-type diagram is this one by Charles Minard – which should need no introduction.
Sankeys in the Tableau Community can be credited back to Jeff Shaffer (http://www.dataplusscience.com/RecreationinTableau2.html), while his original post isn’t a true Sankey diagram – to all intents and purposes it’s a curved slope chart – it led to further posts where he built on his method and others soon followed suit – namely Oliver Catherine, myself and others (too many to mention).
A Sensible Place to Start
If I’m going to start anywhere in my critique of Sankey Charts in the Tableau Community then I should start here, March 2015:
My blog post on how to use data densification was illustrated with the example below (click to see the interactive version):
What does this example show? Is it a useful Sankey diagram?
The diagram as it stands is clearly showing several things at once, and, being an example for a “how to” blog I didn’t truly think of the question it was answering.
What understanding of the data can a viewer tease out of this visualisation?
- Technology is the largest category
- South is the smallest region – the others are similar sizes.
- The split from Category to region is roughly in proportion to the size of the categories
Have a look yourself – can you pick anything more out? Give yourself two minutes?
Hardly ground breaking stuff is it? The majority of the insight comes from the bars at the side of the visualisation – not the actual “flows” (although this is not showing any kind of flow – again is this misleading?).
Consider the charts below:
How about now? Spot that Technology in the East is under performing vs Furniture compared to the other regions?
What about if we swap the dimensions to aid the comparison across regions
We can now see the South is struggling, and the West is particularly poor in Office Supplies.
Both these bar charts show additional insights we couldn’t get from the Sankey. In fact the only benefit the Sankey provides is in the two stacked bars at it’s sides.
I could have picked a better example for my original blog post, but I didn’t. So hands up – my example was a poor one to illustrate a piece on Sankey Charts. My plea to anyone writing “How To” tutorials – please please include a “Why To” / “What To” as well, showing good examples helps educate people as to why your chart type might be useful – and why it might not be.
(further reading on Sankey usefulness along similar lines: https://www.datarevelations.com/circles-labels-colors-legends-and-sankey-diagrams-ask-these-three-questions.html)
Sankey Charts are hard to create in Tableau
I had the pleasure of sitting in on a portion of Kevin Taylor‘s talk as he practiced it at the Tableau conference in New Orleans. I had sneaked in the back to use the room later and Kevin didn’t recognise me – so I had the pleasure of watching Kevin present “my” method of producing Sankey Charts without him being aware of my presence. You can see the portion of his talk here https://youtu.be/PcJjIAq6bxA?t=3074
What struck me was how simple it was to present – Kevin is done in 5 minutes. This could have been a 50 minute talk *just* on Sankeys, covering Data Densification, Nested Table Calculations, etc but the truth is you don’t need to understand any of that to build a Sankey.
So if you can follow instructions (i.e. build an Ikea cupboard) then can you can build a Sankey in Tableau? No, there’s still a lot to take in – abstracting the method to your data and getting the calculations right can be very hard even for programmers as the tweet below testifies.
I spent 40 mins trying to make a Sankey diagram in tableau
That same diagram took me 7 mins in #rstats
My hatred towards domain specific programming grows
— Josh De La Rosa (@JoshdelaRosa1) June 23, 2018
So if it’s hard why do so many people want to build them? I think many people confuse being “good” at Tableau with building complex charts. This isn’t the case – in fact some of the people I consider the “best” users of Tableau have never created a Sankey in their lives.
Being good at Tableau and data visualisation is all about taking complex data sets and breaking them down into simple visualisations that aid the viewers understanding.
Taking simple datasets and making them hard to understand is not data visualisation!
Reading Sankeys takes Effort and Time
Kevin did do better than me in his use case – the example he built (above) shows the flow of humans between continents. We can see that Europe and the Americas are destinations for people from Asia and there are countless other stories hidden in this data which can be seen from the Sankey.
However reading a Sankey like this takes time, it takes effort. It takes a user who understands the chart type. However, even with that understanding and time I’m still left with the feeling – “so what”? What’s the story the data visualisation is trying to tell me? There are so many – but what’s important? Also, again, patterns are hard to distinguish as in the last example…
As we’ll see later so many of the Sankeys produced take no account of the data literacy of the user, nor do they attempt to signpost the stories or explain what the user should be looking for. In essence they do more to confuse than to explain.
What if a Sankey was *really* simple?
InfoTopics have recently released their Extension “Show Me More” which makes Sankey diagrams really simple. It has some downsides in that the Sankey is hard to format and control but generally it turns producing a Sankey into a simple process (assuming you’re happy to enable extensions).
In a world where Sankeys were really simple (either through extensions or via, say, Show Me, I wonder would we see the same usage of them?
Are people conflating “Hard” and “complex” with “Good”? If we remove the “hard” piece from the equation then would we see the same reaction to Sankey charts? Or would the reaction to them be much the same as pie charts?
In my mind both a pie chart and a Sankey chart are just as easy to misuse as each other. One might argue pie charts, being so universal, don’t require the same level of data literacy and so are harder to misuse. Pie Charts are much more maligned though (use them at your peril!) but for me we potentially see more misuse of Sankeys right now without the same visceral reaction.
Okay I can already see people reaching for Twitter to express outrage that I’ve compared Sankeys to the Pie Chart….so let’s move on…
There are many ways I don’t like to see people use Sankey charts
Okay so here I risk being controversial – if you disagree please feel free to call me out in the comments below. Also if I use your Sankey as an example then please don’t take it to heart – I’ve tried to use examples from people who I know can take the criticism in the way it is intended and who I know will feel comfortable to disagree with me if they feel I’m wrong. In short if I’ve used your visualisation then it’s because I respect your work. Remember too, this is only my opinion – I haven’t seen much visualisation research done on the understanding of Sankeys and so it’s hard to go beyond an opinion.
Also remember that when people do Sankey charts they may have different motivations than purely showing the data in a way that amplifies the understanding of the data for the greatest number of people. People may be simply trying to practice a new data visualisation type, or simply engage the audience in a different way. So while I may critique the output then the creator might not have been going for the “best” chart.
I’ve chosen to leave out the creators name in the examples below, I don’t want to make this about individuals as I think the issues I’m highlighting could come from numerous examples – I am just highlighting examples as I find them.
#1 Sankeys with just two dimensions on one axis
In the above example, the Sankey aims to show “which driver contributed most to the teams success” – but the width of the curves for each year are so similar then it makes it difficult to discern the story.
In the example above we’re looking at the choices made by students – the Sankey diagrams are really only there to show the male / female breakdown as well as the split between subjects. The breakdown isn’t too bad for subjects with a high percentage overall but with smaller subjects the size of the curves is impossible to ascertain for male / female and so the analysis is difficult. The creators choice of the Sankey makes the data hard to analyse – a simple stacked bar chart showing the breakdown per subject might be better – or to combine both measures into a treemap might work:
Would three treemaps looks as good? Perhaps not – but they remain better at telling the overall story.
#2 Sankeys that can be replaced by colour legends
If your Sankey chart has dimensions that have a one to one correspondence then what value are the curves providing, beyond bling? In both of the above charts a single list could be used coloured by the appropriate value. It would be vastly simpler to understand. The first example could simple be two lists of teams for example.
#3 Sankey Charts that add no additional understanding
Hold onto your hat I think this one will be controversial…
Sorry but I really struggle with these examples (and there are so many similar examples I could have picked). What is the Sankey telling me? It’s helping show the values breakdown into other dimensions….that’s about it, it’s not giving me any additional information beyond that I can find elsewhere.
So in this case maybe my problem is the fact I’m even considering these in the Sankey category – maybe I need to realise that they’re simply visual references / cues that don’t aim to increase understanding of the data – only of the flow of the data (bottom to top and top to bottom respectively).
For me they take up a lot of “ink”
“Clutter and confusion are not attributes of data — they are shortcomings of design.”
– Edward Tufte
Not everyone will consider Tufte to be correct but consider the subjects, perhaps a visualisation on Jimi Hendrix can get away with being light-hearted and fun – but is migration a subject that needs extra elements added?
We also have to consider “is the juice worth the squeeze” (a favourite Joe Mako quote of mine). Considering the effort to produce these curves in Tableau is it worth it?
Sometimes it is! I love this example:
My reaction to each of these design choices is very subjective – the authors highlighted above haven’t necessarily made the wrong decision (after all it’s just my opinion) but the Sankey loses its effectiveness as a great design choice when it’s used so ubiquitously in the community, so I would encourage people to think carefully before using them, especially for serious subjects.
#4 The confusing Sankey
Again this category could have included any number of visualisations
The above visualisation has lots going on, I wouldn’t call it an engaging visualisation for that reason. It looks complicated. I need to work hard to find any stories in the data – are there any stories that are worth the effort? In an era when 95% of views of a visualisation will come on social media and the user won’t interact with the view then is this the right choice given many may be put off exploring further by the confusing lines?
This second example is still confusing, there’s a lot going on but the data is more interesting and it’s certainly easier to pick out some interesting facets. The stories the author highlights in the accompanying blog piece are below:
Florida, Texas, and California combined produced 44% of the Rivals 100 from 2010 and 2011. Alabama landed 15 of the recruits, but 13 of them were undrafted. Out of the 200 total recruits, 142 of them were undrafted as well. Even with these recruits being the best of the best from high school, only 29% of them made it to the NFL. Check out the visualization and see if you can find other insights.
The takeaway for me is that the main stories – two out of the three anyway – were from the stacked bar charts at the side of the visualisation. I’d love to explore more but the interaction is difficult and doesn’t allow me to see the full flow from end to end (e.g. how many originating in Florida went to Buffalo – perhaps Set actions might allow us to build better interactions to allow stories to be picked out more clearly? I’d love to be able to click on the left and see the full path for example, or see the results filter accordingly (and appropriate percentages show up as labels).
Are there any Sankeys you like?
For me a good Sankey should have a clear story, minimal confusion and a purpose that extends beyond chart bling, The example below is well designed and works well to that purpose.
What should you take away from this post. Firstly perhaps it’s obvious I have a dislike of Sankeys that borders on the pathological, so I’m clearly biased – it takes a lot for me to like a Sankey chart. Perhaps that comes from years of being thanked for my tutorial as I’m tagged in ugly Sankeys on Twitter.
Sankeys are not a bad choice of chart, there’s no such thing as a bad chart. The choice of a chart should come down to several things:
Story In an explanatory visualisation does it successfully convey the story you want to tell? If the visualisation is exploratory then spend a few minutes looking at the visualisation – is the “juice worth the squeeze” for the viewer?
Medium How will the visualisation be consumed? If it’s likely to be as a static image on Twitter does the visualisation still work?
Data Literacy Does your audience have the required knowledge to interpret the chart?
Alternatives Are there simpler chart types that would tell your story better?
Cool factor If you’re simply going for a “cool!” reaction then does the subject warrant it? Is your visualisation making it clear that it’s a “cool” viz or does it still expect to provide some serious takeaways? Are these two aims conflicting?
The most serious takeaway though is that you shouldn’t stop doing Sankey Charts just because a Tableau Zen Master has written a piece criticising them. The aim of this blog post is to try and make people stop and think about their visualisation choice with regards Sankey Charts, they need to justify it to themselves, not me. If you’re having fun and challenging yourself creating them then great, why the hell shouldn’t you.
Also there will be plenty of experts, far more qualified than me, who disagree – I’d love to hear your comments below and on Twitter (@chrisluv).