About Chris Love

data professional, runner, photographer; these are my hobbies at present, I make no guarantees to the future.

Using Inspect / Javascript to scrape data from visualisations online

My last post talked about making over this visualisation from The Guardian:

2016-11-13_12-55-29

What I haven’t explained is how I found the data. That is what I intend to outline in this post. Learning these skills is very useful if you need to find data for re-visualising data visualisations / tables found online.

The first step with trying to download data for any visualisation online is by looking checking how it is made, it may simply be a graphic (in which case it may be hard unless it is a chart you can unplot using WebPlotDigitiser) but in the case of interactive visualisations they are typically made with javascript unless they are using a bespoke product such as Tableau.

Assuming it is interactive then you can start to explore by using right-click on the image and choose Inspect (in Chrome, other browsers have similar developer tools).

2016-11-13_19-26-35

I was treated with this view:

2016-11-13_19-28-09.png

I don’t know much about coding but this looking like the view is being built by a series of paths. I wonder how it might be doing this? We can find out by digging deeper, let’s visit the Sources tab:

2016-11-13_19-31-30

Our job on this tab is to look for anything unusual outside the typical javascript libraries (you learn these by being curious and looking at lots of sites). The first file gay-rights-united-states looks suspect but as can be seen from the image above it is empty.

Scrolling down, see below, we find there is an embedded file / folder (flat.html) and in that is something new all.js and main.js….

2016-11-13_19-34-05

Investigating all.js reveals nothing much but main.js shows us something very interesting on line 8. JACKPOT! A google sheet containing the full dataset.

2016-11-13_19-38-25

And we can start vizzing! (btw I transposed this for my visualisation to get a column per right).

Advanced Interrogation using Javascript

Now part way through my visualisation I realised I needed to show the text items the Guardian had on their site but these weren’t included in the dataset.

2016-11-13_19-41-27

I decided to check the javascript code to see where this was created to see if I could decipher it, looking through main.js I found this snippet:

function populateHoverBox (type, position){

 var overviewObj = {
 'state' : stateData[position].state
 }
.....
if(stateData[position]['marriage'] != ''){
 overviewObj.marriage = 'key-marriage'
 overviewObj.marriagetext = 'Allows same-sex marriage.'
 } else if(stateData[position]['union'] != '' && stateData[position]['marriageban'] != ''){
 overviewObj.marriage = 'key-marriage-ban'
 overviewObj.marriagetext = 'Allows civil unions; does not allow same-sex marriage.'
 } else if(stateData[position]['union'] != '' ){
 overviewObj.marriage = 'key-union'
 overviewObj.marriagetext = 'Allows civil unions.'
 } else if(stateData[position]['dpartnership'] != '' && stateData[position]['marriageban'] != ''){
 overviewObj.marriage = 'key-marriage-ban'
 overviewObj.marriagetext = 'Allows domestic partnerships; does not allow same-sex marriage.'
 } else if(stateData[position]['dpartnership'] != ''){
 overviewObj.marriage = 'key-union'
 overviewObj.marriagetext = 'Allows domestic partnerships.'
 } else if (stateData[position]['marriageban'] != ''){
 overviewObj.marriage = 'key-ban'
 overviewObj.marriagetext = 'Same-sex marriage is illegal or banned.'
 } else {
 overviewObj.marriagetext = 'No action taken.'
 overviewObj.marriage = 'key-none'
 }

…and it continued for another 100 odd lines of code. This wasn’t going to be as easy as I hoped. Any other options? Well what if I could extract the contents of the overviewObj. Could I write this out to a file?

I tried a “Watch” using the develop tools but the variable went out of scope each time I hovered, so that wouldn’t be useful. I’d therefore try saving the flat.html locally and try outputting a file with the contents to my local drive….

As I say I’m no coder (but perhaps more comfortable than some) and so I googled (and googled) and eventually stumbled on this post

http://stackoverflow.com/questions/16376161/javascript-set-file-in-download

I therefore added the function to my local main.js and added a line in the populateHoverBox function….okay so maybe I can code a tiny bit….

var str = JSON.stringify(overviewObj);
 
download(str, stateData[position].state + '.txt', 'text/plain');

In theory this should serialise the overviewObj to a string (according to google!) and then download the resulting data to a file called <State>.txt

Now for the test…..

downloadingfiles

BOOM, BOOM and BOOM again!

Each file is a JSON file

2016-11-13_20-07-21

Now to copy the files out from the downloads folder, remove any duplicates, and combine using Alteryx.

2016-11-13_20-04-59

As you can see using the wildcard input of the resulting json file and a transpose was simple.

2016-11-13_20-08-31

Finally to combine with the google sheet (called “Extract” below) and the hexmap data (Sheet 1) in Tableau…..

2016-11-13_20-09-41

Not the most straightforward data extract I’ve done but I thought it was useful blogging about so others could see that extracting data from visualisation online is possible.

You can see the resulting visualisation my previous post.

Conclusion

No one taught me this method, and I have never been taught how to code. The techniques described here are simply the result of continuous curiosity and exploration of how interactive tables and visualisations are built.

I have used similar techniques in other places to extract data visualisations, but no two methods are the same, nor can a generic tutorial be written. Simply have curiosity and patience and explore everything.

 

Combining Multiple Hexmaps using Segments

After my #Data16 talk Chad Skelton challenged me to do a simple remake of the Guardian sunburst-type visualisation that I critiqued in my Sealed with a KISS talk (which you can now watch live at this link).

The original visualisation is show below:

2016-11-13_12-55-29.png

While initially engaging, I find this view complex to read and extracting any useful information involves several round trips to the legend. The circular format makes the visualisation appealing while sacrificing simple comprehension. Could I do better though?

Chad suggested small multiple maps and I agreed this might be the simplest approach but I was not happy with the resulting maps:

2016-11-13_18-22-51

 

Alaska and Hawaii why do you ruin my maps? The Data Duo have several solutions and my favourite is the tile map.

Thankfully Zen Master Matt Chambers has made Tile Maps very easy in this post and so I followed the instructions, joining the Excel file he provided onto my data and giving a much more visually appealing and informative result. The resulting visualisation is below (click for an interactive version):

cxjnfitxgae5mkn

However I still wasn’t satisfied with this visualisation, it has several problems:

  • it separates out the variables per state, meaning the viewer till has a lot of work to do to compare each states full rights.
  • it still requires the use of the legend to fully understand
  • the hover action reveals extra info meaning the users has to drag around to reveal the story
  • the legend is squashed due to space

How to solve these issues? I spent a while pondering it and eventually I found a possible answer: I could use a single map but split each hexagon into segments (ignoring marriage as it is allowed in all states – another solution woudl have been to cut out a dot in the middle for the seventh segment).

To do this I’d need to split up each Hexagon into segments, therefore I took out my drawing package and created six shapes:

These six shapes have transparent backgrounds and, importantly, when combined create a single hexagon.

Now with these shapes I can use a dimension (such as Group below) on shape, and then use colour to combine each hegaxon into different segment colours on the map (using Matt’s method and data for Hex positions).

2016-11-13_18-41-23.png

Using this technique I therefore created the visualisation below (click for interactive version):

2016-11-13_15-58-05

Using this method it would be possible to combine 3, 6, 9 or 12 (or possibly more) dimensions on a single map by segmenting the hexagons. Similarly using a circle in the middle would allow 4 or 7 dimensions.

I’m not sure how applicable this type of method is to other visualisations but please let me know if you use it as I’d love to see some more examples.

MM Week 44: Scottish Index of Multiple Deprivation

This weeks Makeover Monday (week 44) focuses on the Scottish Index of Multiple Deprivation.

2016-10-30_21-54-04

Barcode charts like this can be useful for seeing small patterns in Data but the visualisation has some issues.

What works well

  • It shows all the data in a single view with no clicking / interaction
  • Density of lines shows where most areas lie e.g. Glasgow and North Lanarkshire can quickly be seen as having lots of areas among the most deprived
  • It is simple and eye catching

What does work as well

  • No indication of population in each area
  • Areas tend to blur together
  • It may be overly simple for the audience

In my first attempt to solve these problems I addressed the second problem above using a jitter (using the random() function)

2016-10-30_22-05-26

However it still didn’t address the population issue and given the vast majority of points had similar population with a few outliers (see below) I wondered whether to even address the issue.

2016-10-30_22-08-40

Then I realised I could perhaps go back to the original and simply expand on it with a box plot (adding a sort for clarity):

2016-10-30_22-16-23.jpg

Voila, a simple makeover that improves the original and adds meaning and understanding while staying true to the aims of the original. Time for dinner.

Done and dusted…wasn’t I? If I had any sense I would be but I wanted to find out more about the population of each area. Were the more populated areas also the more deprived?

There have been multiple discussions this week on Twitter about people stepping beyond what Makeover Monday is was intended to be about. However there was story to tell here and I dwelled on it over dinner and, with the recent debates about the aims of Makeover Monday (and data visualisation generally), swirling in my head I wondered what I should do.

I wondered about the rights and wrongs of continuing with a more complex visualisation, should finish here and show how simple Makeover Monday can be? Or should I satisfy my natural curiosity and investigate a chart that, while perhaps more complex, might show ways of presenting data that others hadn’t considered….

I had the data bug and I wanted to tell a story even if it meant diving a bit deeper and perhaps breaking the “rules” of Makeover Monday and spending longer on the visualisation. I caved in and went beyond a simple makeover….sorry Andy K.

Perhaps a scatter plot might work best focusing at the median deprivation of a given area (most deprived at the top by reversing the Rank axis):

2016-10-30_22-11-22

 

Meh, it’s simple but hides a lot of the detail. I added each Data Area and it got too messy as a scatter – but how about a Pareto type chart…

2016-10-30_22-23-23.jpg

So we can see from the running sum of population (ordered by the most deprived areas first) that lots of people live in deprived areas in Glasgow, but we also see the shape of the other lines is lost given so many people live in Glasgow.

So I added a secondary percent of total, not too complex….this is still within the Desktop II course for Tableau.

2016-10-30_22-26-03.jpg

Now we were getting somewhere. I can see from the shape of the line whether areas have high proportions of more or less deprived people. Time to add some annotation and explanation….as well as focus on the original 15% most deprived as in the original.

Click on the image below to go to the interactive version. This took me around 3 hours to build following some experimenting with commenting and drop lines that took me down blind (but fun) alleys before I wound back to this.

2016-10-30_21-51-53

Conclusion

Makeover Monday is good fun, I happened to have a bit more time tonight and I got the data bug. I could have produced the slightly improved visualisation and stuck with it, but that’s not how storytelling goes. We see different angles and viewpoints, constraining myself to too narrow a viewpoint felt like I was ignoring an itch that just needed scratching.

I’m glad I scratched it. I’m happy with my visualisation but I offer the following critique:

What works well:

  • it’s more engaging than the original, while it is more complex I hope the annotations offer enough detail to help draw the viewer in and get them exploring.
  • the purple labels show the user the legend at the same time as describing the data.
  • there is a story for the user to explore as they click, pop-up text adds extra details.
  • it adds context about population within areas.

What doesn’t work well:

  • the user is required to explore with clicks rather than simply scanning the image – a small concession given the improvement in engagement I hope I have made.
  • the visualisation take some understanding, percent of total cumulative population is a hard concept that many of the public simply won’t understand. The audience for this visualisation is therefore slightly more academic than the original. Would I say this is suitable for publishing on the original site? On balance I probably would say it was. The original website is text / table heavy and clearly intended for researchers not the public and therefore the audience can be expected to be willing to take longer to understand the detail.

Comment and critique welcomed and encouraged please.

Makeover Monday Week 43: US National Debt

2016-10-23_11-50-50

This weeks Makeover Monday tackles National Debt. Let’s start by looking at the original visualisation.

Apparently the US National Debt is one-third of the global total. Showing these two values in a pie chart is a good idea as it quickly shows the proportions involved. However the pie chart chosen does have a strange white think slice between the two colours and a black crescent / shadow effect on its outside edge which add no real value (in fact the white slice added a bit of confusion for me).

The visualisation then goes on to show $19.5 trillion dollars in proportion to several other (equally meaningless) large figures. The figures do add some perspective on just how big that figure is and the use of $100 billion blocks in the unit chart does allow an easy comparison. One slightly critical feature, if we were to pick holes in the visualisation, is that half-way through the view starts showing the shaded blocks to compare to the 19.5 trillion, whereas before it doesn’t.

2016-10-23_12-02-50

with shaded blocks

2016-10-23_12-03-18

no shaded blocks

Achieving consistency is important in data visualisation as it lets the reader know what to expect and gives them a consistent view each time to aid comparisons. So making a design decision to add shaded blocks across each comparison would perhaps have been a better choice as opposed to switching half way through.

Visualising Small Data

The dataset provided for the weeks makeover has simply two rows, showing the debt for each area (US and Rest of the World).

2016-10-23_12-10-08

Clearly this presents a visualisation challenge. Visualising small datasets is hard, as there are limited choices. One can attempt to include secondary datasets to show the numbers in context, as the original author has done but another, simpler choice, might be to show them relative to each other – similar the original’s pie chart. One might even attempt to show how the data corresponds to the population of the US or the world, attempting to bring the figure down to something manageable (in the US the debt is a more comprehensible $61,000 per head).

Before we attempt to visualise something though we need to think about the audience and message we want to provide. Are we simply trying to show the figures without any comment? or do we want to focus on how large they are? or are we commenting on how large the US debt is to the rest of the world and making a social / political comment?

With a dataset so small any editorial comment is difficult though. For example we have no context on the direction of movement of these figures. The US might be quickly bringing it’s debt under control, while the ROW grows, or the opposite might be true. The ROW figure might be dominated by other developed countries, or might be shared equally. How can we comment without further analysis on temporal change or the context of this figure?

If we can’t comment editorially then we are left with simply showing how huge these numbers are. My criticism of the original is that the number it shows in comparison are equally huge, and equally incomprehensible for a lay person. Given this visualisation is published on a website Visual Capitalist perhaps their audience is more familiar with global oil production or the size of companies but for any visualistion published away from the site a more meaningful figure is needed. Personally I think the amount per head is an especially powerful metaphor. In the US $61,000 dollars each would be required to clear the debt, the ROW world would just have to pay a little over $5.

To Visualise or not to Visualise

Now there is an important decision here, how to effectively show those figures in context. However with such small data is there any point in doing so? Everyone can quickly see $5 is much less than $61,000 – we don’t need a bar chart or bubble to show that, and we certainly don’t need a unit chart or anything even more complex. This is the problem with small datasets, any visual comparison is slightly academic given we can quickly mentally interpret the numbers.

One might be tempted to argue that a data visualisation is needed to engage our audience. Perhaps a beautiful and engaging data visual might do a good job of this, however so would the use of non-data images like the the below.

us-national-debt

Defining Data Visualisation

Makeover Monday is a weekly social data project, should a visual that includes only text be included?

What if the pile of dollars in the image above had exactly 61,000 dollar bills would that make it any more of a data visualisation than one that contained a random amount? What if, instead, we added as a unit chart with 12,200 units of $5 bills? These accompanying items don’t help us visualise the difference any better than the text. One could argue where the main purpose of a visualisation isn’t to inform or add meaning or context, and is instead used as a way of engaging the user, then it becomes no different to any other image used in this way. Therefore adding any more data related visualisations to the above text wouldn’t make the image any more of a data visualisation than the one above.

Semantic arguments that attempt to define data visualisation are interesting but academic. Ultimately each project that uses data does so because it needs to inform its audience, and it is the success of the transaction from author to audience that deems how successful the project is.

So should we define a data visualisation as more (or less) successful because of its accompanying “window decoration” (or lack thereof)? In my opinion yes. Accompanying visuals and text help provide information to the audience and can help speed up the transfer of information by giving visual and textual clues.

Do charts / visuals that make no attempt (or poor attempts) to inform the audience add any more value to a data visualisation project simply because they use data? In my opinion, no. This isn’t the same same thing as saying they have no value but simply producing a beautiful unit chart, say. with the data for this Makeover Monday project would add no intrinsic extra value in educating the audience and therefore would be no more valuable than any other picture or image.

Is the above image a successful Data Visualisation? Let’s wait and see on that one. I’m intrigued to see what the community makes of a purely text based “visualisation”.

Does it do a better job at informing the audience than the original? Again this is hard to answer but I believe I understand more about the size of the debt when it is visualised in terms of dollars per head. By bringing these numbers down to values I understand I did’t need to add any more visualisation elements in the same way as the original author, therefore you might say mine is more successful because it manages to pass across information in a simpler, more succinct transaction.

UK Netflix Movie Finder

Click on the image below to see my submission for the “Mobile” IronViz contest – it should work on both mobile, tablet and desktop.

desktop

Ideation

I’ll be honest, I didn’t start this Viz until Friday night before the contest ended on Sunday. My wife was out Friday and Saturday evenings and so I knew I had a few hours…however with little time I didn’t want to waste my time producing a visualisation on something trivial. Instead I wanted to produce a visualisation on something useful, an “app”, something I’d use.

For a long time now I’ve wished I could find something that would save me the job of hunting down decent movies on Netflix. I have a Netflix Subscription but sometimes hidden gems can be hard to root out. I have a good track record of finding good movies though it.takes me a long time to hunt through reviews online as I look through Netflix. It’s become a running joke I’ll spend longer picking a movie than actually watching it.

If only there was an app or website that would give me both combined…..

Friday Night is Data Night

So with my idea in mind Friday night I spent trying to get data…after some googling there weren’t many easy options and so I found myself installing Python to try a Python script to scrape Netflix but hours went by without luck (all while watching Butch Cassidy and the Sundance Kid on Netflix). The lesson: don’t try and learn python in an hour.

Head in hands and running out of ideas I google some more and found www.cinesift.com – it was purpose built to do what I needed and had all the data I needed in its search (wish I’d found this site ages ago!) but how to get at the data?

Brute force searching seemed the best option so I ran a search for Netflix UK movies and then scrolled down and down and down to populate the dynamic page…then Ctrl+A, Ctrl+C and Ctrl+V gave:

films-text

Ouch….I also took the HTML source for use later. Time for sleep….

Saturday Night Feels Alright

Now the previous night had proved a mixed bag, so I turned to my trusty companion Alteryx to solve my data woes:

alteryx

What does this spaghetti do? Well it takes the pretty horrible format txt file and turns into into rows and columns of proper data. The trick was to assign a row ID to each row and restart at the “Play Trailer” section marking a new movie. Then I simply needed to crosstab and rename the data. It also pulls out the Movie images from the html source uisng Regex and finds their URL. It combines the two and then splits out multiple genres and casts / directors into separate fields (in the end this last step wasn’t needed but I thought I might do, without it the Alteryx module is massively simplified removing the whole last row).

Then it was to Tableau…I decided to design for mobile first and quickly over a couple of hours designed a few initial drafts of pages then, as it was getting late I posted them to my colleagues for comment.

Next morning, while I was getting the kids ready to head out, the comments started coming back:

comments.png

 

I love that I have access to such a great and diverse range of opinions and talent from my peers at The Information Lab. As you can see I got loads of useful feedback – if you want to make a visualisation better just share it as much as possible, ideally on a collaboration tool with image commenting so people can highlight their comments with the corresponding piece of the image.

Sunday Night Polish

Tonight, Sunday, was all about acting on the feedback and building the Desktop and Tablet views. I designed a background theme of the visualisation (using a very quick piece of photoshop work with the Netflix logo) which I incorporated into Desktop – the black left panel but I soon realised that there were some limitations with the Device Designer in Tableau in this initial 10.0 version.

Firstly changing the background of Filters and Parameters alters them for ALL devices. Ouch. That meant ones overlaid on black looked odd on mobile when overlaid on white. Normally a quick solution would be to add a Container and colour it black but in device mode you can’t format object…grrr I was getting frustrated due to my lack of knowledge of this new feature and it’s limitations.

It was hard work fitting the phone layout to different sized phones. Lack of real estate means having to compromise on design vs functionality. All aspects of the visualisation, Text, Filters, Logos need justifying in terms of space. I loved the challenge that working on mobile provided and I hope it makes people entering the competition focus on simple (KISS) visualisations.

I ended up working on the smallest device and then checking it resized okay onto larger screens. As you can see below the differences are quite big depending on the phone.

In the end I decided the best approach was to switch my designs to Floating to overcome this limitation…while not ideal it did allow me to work round most of the problems. However images needed some tweaking as they expand / contract using Fit Width / Fit All.

Anyway..I got it done so I’m happy…and before midnight too. All in all I remain pleased with what was just around 12 hours work!

 

 

 

 

 

 

 

 

 

Avoiding the Bubble – 10 ways to broaden your data visualisation horizons on social media

We know all too well that without proper care online communities can easily become bubbles, effectively becoming echo-chambers of opinion that, unchecked, can leave unwary users with a very distorted view of how real world opinion differs from that in their online community.

We seek out social contact within a relatively narrow set of views and ideologies; we are naturally attracted to people who share our views and actively shun those who don’t. This has, and is, playing out in the political world at the moment with many “remain” voter left reeling after the UK voted for Brexit. For me personally this meant I could debate and engage with only one Brexit voter in my network – did this really help me shape my opinion and attitudes? Did I affect anyone elses as a result of my discussions on the subject, or did we simply reinforce our own beliefs? The blunt truth is that I comprehensively failed to either appreciate or engage with any other viewpoint apart from those almost identical to my own. On the political spectrum the two camps were ideologically so far apart that this polarisation of views was reinforced on both sides and arguments that seemed obvious to one side failed to land on the other. This left the Remain vote in disarray as they failed to appreciate their own failings. In the US we are also seeing this play out with the lead of 60 / 70 points, predicted by many for Clinton, failing to materialise as Trump supporters continue to be disengaged by any alternative despite the Republican’s many “gaffes”.

I won’t dwell on it here as this echo-chamber effect has been been discussed by many, a particularly good article by David Byrne is well worth reading.

echo-chamber-7

Echo Chamber by Christophe Vorlet, 2016

 

Data Visualisation

Within data visualisation, my field of interest, it is easy to see the same issues play out. In the data viz world online communities have typically been built around software / solutions;  Tableau, Qlik, PowerBI, D3, R to name a few; as well as having a more general solution agnostic communities typically flourishing around experts / researchers or special interest such as sports data . Visualisations that might not get a second glance in one community can be lauded as the best thing since sliced bread in others – often praise revolving around the technical difficulties of producing the visualisation as opposed to their validity as a useful / interesting visualisation or analysis. The echo of what is “good” / “bad” can vary wildly between solutions and communities (though typically a hatred of a pie chart unites communities in a common rallying cry).

For new members of the data visualisation community it can be very easy to become distracted by these echoes and feel that certain techniques or visualisations offer more value (based on feedback from the community) than others. Of more concern is that without checks and balances communities can easily alienate those who don’t share similar opinions to those in the “bubble” leading to an increasingly narrow set of viewpoints, all reinforcing each other.

city-people-bubble-soap-large

How to avoid the Bubble

With this in mind I wanted to offer some tips to the discerning social media user in the Data Visualisation world, new or old, on how to avoid the bubble effect and ensure your timeline remains diverse.

  1. Remember you are in a bubble

Simply being aware of the fact that our online communities don’t reflect the real world is a start. Remember it. Try and actively switch your viewpoint to that of an outsider at regular intervals in order to try and see your community through a different lens.

  1. Be yourself

Online communities are seen y some as a means to end career and learning-wise but that doesn’t stop you developing an online personality and diversifying your posts. Showing people who you are outside the community will help people relate in a different way to your online self and give them confidence to challenge your views if they want to

  1. Diversify who you follow

Okay so this one is fairly obvious but it needs to be said: don’t just follow people who are likely to agree with you. Go out of your way to look for communities in other areas away from your chosen data visualisation solution – use Twitter lists if you wish to ensure your timeline doesn’t become cluttered.

Follow

Twitter recommendations serve to narrow, not broaden, your network.

Follow a wide range of genders and ages, go outside your normal circles, a diverse network of followers will server to provide a range of views to counterbalance yours.

  1. Diversify your followers

So this is harder, but you really need to make sure you have a wide range of different viewpoints in your follower list. That way your posts are more likely to be debated as opposed to be accepted at face value.  How do you do this? Post on different subjects away from your core solution, e.g. if you primarily post about Tableau then try to keep your posts generic, or try building visualisations in a range of different software to ensure you attract followers from different software vendors / solutions. Build a broad base of content but remain focused to ensure you appeal to your broader audience. Don’t be afraid to lose followers in this manner – personally I’d rather one follower who offers a counterpoint to my views than two who don’t.

  1. Diversify your inputs

There’s really no better way to open up your horizons than by drawing inputs from across multiples streams; Reddit, Twitter, LinkedIn, Facebook, Books, Blogs, Conferences, Periscopes, Meetups are all ways to try and seek out new contacts. Try to actively look for communities that do things differently, or might even actively disagree with you, and try to shift your perspective to theirs. There is no right or wrong solution and altering your perspective can make the world seem a very different place.

  1. Challenge the status quo

It’s okay to disagree now and then if you do it the right way – there’s a balance between being “that guy” and debating productively with someone who is willing to listen. Be especially careful of providing a dissenting voice if you’re new to a community e.g. a Qlik user in the Tableau community might see his/her views dismissed. However, don’t disagree on everything, it get’s tiresome in communities to see constant disagreement (further reading here from Ben Jones).

  1. Avoid being a fanboy / girl

In the same way then agreeing, retweeting and liking everything adds very little value to a community. Work out why you’re in the community; do you want to help new users, publish your own content, get help to solutions? Develop an online profile / personality around those interests and share content while adding your own comments. Followers will engage much more with cultivated, meaningful content that you have added value to.

download

  1. Don’t take feedback too much to heart

Positive feedback feels great, it’s sometimes overwhelming to have your visualisation praised by the community and it’s easy for it to go to your head but be aware that that is only likely to be one viewpoint, albeit a shared one. Learn to critique your own visualisations and rely less on likes / retweets / Viz of the Day as a way of judging a projects value.

Similarly just because a project gets negative (or worse no) feedback then that doesn’t mean it doesn’t have value. Social Media can be very fickle, things on a populist theme will get much more attention than anything of genuine business value.

  1. Seek out feedback from alternative sources

Seek out alternative feedback from different communities or on different platforms. One of the best ways is to ask for honest feedback from one or two trusted contacts / experts privately where they are more likely to give your work time and energy as opposed to glancing over it, or simply hitting the retweet button. The value of one meaningful critique like this is not to be underestimated, 140 characters is isn’t enough to get any meaningful feedback and so many people won’t bother.

  1. Don’t just take my word for it

Do you agree? Look for other methods of avoiding the bubble online, a lot has been written about the social media bubble. Discuss, comment, argue and debate with me – I’d love to hear from you. I’m happy to be wrong.

What does this mean for me personally?

I’ll be the first to admit I’m well inside the bubble myself. Very few of my contacts and peers in the world of data visualisation come from outside the Tableau world. I rarely use any other solution to build data visualisations and I fail to engage with any media away from Twitter and LinkedIn professionally. I could do a lot more.

Over the next few months I need to ensure I broaden my horizons using the tips above, work with new people and seek out their opinions. I intend to work with new communities and with new solutions to see the world from their point of view….and I’ll be richer for it.

.

“Fitted” Gannts in Tableau

The Challenge

During Makeover Monday this week (week 22) I came across a problem: I needed to produce a Gantt chart for a huge amount of overlapping dates. Gantt was really the only way for me to go with start and end dates in the data (in the back of my head I’m thinking Mr Cotgrave will be loving this data given his fascination with the Chart of Bigraphy by Priestly) and I was fixated with showing the data in that way (I blame Andy) but everything I tried in Tableau left me frustrated.

Jittering left wide areas of open space and no room for labels, even if I zoomed into one area would render leave lots of the data unexposed.

2016-05-30_21-58-59

I knew what I wanted to do…I wanted to neatly stack / fit the bars in a decent arrangement to optimise the space and show as much data as possible at the top of the viz. The original author in the link for the makeover had done this as such:

Now Makeover Monday usually has a self-imposed “rule” that I tend to adhere to, spend an hour or less (if I didn’t stick to this I could spend hours), but here I was after half an hour without any real inspiration except something I knew wasn’t possible in Tableau. It was a challenge and to hell with rules I do like a challenge – especially given the public holiday in the UK meant I had a little time.

The Algoritm

So I turned to Alteryx, but how to stack the bars neatly.

Firstly I needed a clean data set, so I fixed some of the problems in the data with blank “To” dates and negative dates using a few formula and then I summarised the data to just give me a Name, From and To date for their life.

Algorithm-wise I think I wanted to create a bunch of discrete bins, or slots, for the data. Each slot would be filled  as follows:

  1. Grab the earliest born person who hasn’t been assigned a slot
  2. Assign them to a slot
  3. Find the next person born after they die, and assign them to the same slot
  4. Repeat until present day

In theory this would fill up one line of the Gantt. Then I could start again with the remaining people.

An iterative macro would be needed because I would step through data, then perform a loop on the remainder. First though I realised I needed a scaffold dataset, as I needed all the years from the first person (3100BC to present day).

I used the Generate Rows tool to create a row per year, and then joined it to my Name, Birth, Year data to create a module that looked like:

2016-05-30_22-10-07

Data:

2016-05-30_22-11-17

I’d fill the “slot” variable in my iterative process. So next up my iterative macro.

Translating the above algorithm I came up with a series of multi-row formula:

2016-05-30_21-29-41.png

The first multi-row formula would assign the first person in the dataset a counter, which would count down from their age. Once it hit zero it would stay at zero until a new person was born, at which time it would start counting down from their age.

The second multi-row formula would then look for counters that had started to work out who had been “assigned” in this slot and assign them the iteration number for the macro, i.e. first run would see everyone going in slot 1, second in slot 2, etc.

Perfect! Now to run it and attach the results to the original data:

2016-05-30_22-19-25

Easy peasy Alteryx-squeezy. That took me 30 mins or so, really not a long time (but then I have been using Alteryx longer than you….practice makes perfect my friend).

The Viz

So now back to Tableau:

2016-05-30_22-23-24

Neat, progress! Look at how cool those fitted Gannt bars look.  Now what….

Well I need to label each Gantt with the individuals name but to do that I really have to make my viz wide to give each one enough space….

2016-05-30_22-25-06

The labelling above is on a dashboard the maximum 4000 wide…..we need wider! But how? Tableau won’t let me….

Let’s break out the XML (kids don’t try this at home). Opening up the .twb in Notepad and….

2016-05-30_22-27-58

I changed the highlighted widths and low and behold back in Tableau – super wide!

Now I can label the points but what do I want to show – those Domain colours look garish….

So I highlighted Gender and….pop. Out came the women from history – nice story I think to myself. I decided not to add a commentary, what the viewer takes from it is up to them (for me I see very few women in comparison to men).

Other decisions

  • I decided to reverse the axis show the latest data first and make the reader scroll right for the past, mainly I did this because the later data is more interesting
  • I decided to zoom in at the top of the viz, generally I expect viewers won’t scroll down to the data below but while I toyed with removing it I decided that leaving it was a slightly better option. The top “slots” I’m showing are arbitrarily chosen but I feel this doesn’t spoil the story.
  • I decided to add a parameter to highlight anything the user chose (Gender or Occupation) – tying it into the title too.
  • I fixed AD / BC ont he axis using a custom format

2016-05-30_22-41-28

Conclusion

So I spent a couple of hours in total on this, way more that I planned today but that’s what I love about Makeover Monday – it sets me challenges I’d never have had if I hadn’t been playing with the data. I’ve not seen this done in Tableau before so it was a fun challenge to set myself.

Click on the image below for the final viz

2016-05-30_21-17-19