About Chris Love

data professional, runner, photographer; these are my hobbies at present, I make no guarantees to the future.

IronViz Feeder 2 – Retrospective

An interview with myself looking back at the recent IronViz Feeder for Health and Well-Being. Below is my final visualisation, click for the interactive version.

Takeaways

Perhaps I can start by asking you what you thought about this Iron Viz theme, did it get you excited – did you immediately have themes that sprang to mind?

To be honest I have a love / hate relationship with these feeders, on the one hand I love the open-endedness of the theme, yes it’s a guide but you can go almost anywhere with it, but for me it still feels too open to mean my imagination struggles to settle on one particular idea.

I was already doing some work for a Data Beats article on Parkrun and their accessibility and I initially wanted to cover this but what I had in mind didn’t fit nicely into a single visualisation or set of dashboards. I also had in mind some work on Agony Aunts – comparing the Sun’s Dear Deidre to The Guardian’s Mariella Fostrop based on the words they used – but the analysis started taking too long….

So that’s an interesting point – how do you balance out the various aspects of visualisation when choosing a subject? Do you choose subjects that require little data preparation so you can maximise data visualisation time or look for more complex subjects? 

When choosing a subject I’m primarily interested in choosing a subject that interests me. If the subject do that then it isn’t going to make me want to stay up til 3am working on it, or dedicate hours outside work. Let’s face it, I’m not in IronViz to win the thing, although I’d love to, there’s just too much talent and competition out there for me to compete, and so I’d rather just have fun doing a visualisation.

That said, I also don’t want to pick a subject that feel’s too easy – I like to work at my data and perform some analysis, I want to be able to say “This is what I found” rather than “This is what the dataset I found said”. The difference is subtle but I see this as a direction my public visualisation path is taking more and more lately. So I want to build and define my own analysis and say something with it – I do take inspiration from other sources, after all very little is new or novel today, but for me the analysis is as important as the visualisation itself.

This is where too the “data journalism” aspects of data visualisation are important, in the IronViz scoring criteria this is labelled as “Storytelling”.  However you label it I interpret it as not just showing the numbers. Anyone can show numbers visually, they can show the highest and the lowest and the relationship between them. They can design a dashboard and they can publish it. That isn’t data storytelling though, it’s data presentation. I want to convey why someone should be interested in the numbers, what do the numbers tell us and why is it important, and what should we do because the numbers show what they do.

So you mean adding commentary?

Well yes, but that’s only part of what I’m talking about. What I’m getting at is that this storytelling goes right back to the data choice, the subject choice and the analysis. And it’s not about presenting numbers back that people should care about either; it’s about doing some meaningful analysis and telling a story that is different, not the same old numbers presented in a different way.

It sounds like you feel the way you approach IronViz now is perhaps different to the way you’ve approached it in the past. What’s prompted it do you think?

Certainly it’s been a journey to get to this point, probably starting with my Springs got Speed visualisation in last years Iron Viz. As to what has prompted a shift towards this more analytics direction, well I suppose it’s the same things that prompted Rob and I to start Data Beats. Sometimes you look at the Iron Viz entries and you feel like you’re in a game where everyone is kicking the ball when suddenly someone comes in, picks it up and starts running with it. Over the last year or so the norms in the Tableau community certainly seem to have shifted; what was considered good a few years ago is now very, very average and people are pushing boundaries left right and centre.

When people start pushing boundaries you really are left with two choices; you can either find your own boundaries to push or settle down and try to do the basics really, really well. So while perhaps in the past I was happy to push boundaries, there are now others who do much wilder stuff than I ever could – and so I really need to hunker down and do the basics as well as I can.

So tell us about your Iron Viz. Where did the idea come from and how did you choose to approach it?

I decided to look at how deprivation is linked to the number of takeaways in an area, looking back I think like any good idea it didn’t come from any one source, instead it came from several seeds over time. Certainly walking around my own town, which is in a relatively deprived area I see a lot of takeaways, we get new takeaway leaflets every day and, where once the town centre was made up of lots of different stores now I see about twelve takeaways (contrast this to just two as I was growing up – in perhaps 500m of shops). There’s been some similar research on these links already, this article sticking in my mind recently.

Having explored other ideas and failed I knew I could get this one off the ground quickly – I’ve played with the Food Standards Agency’s Ratings data before and knew I could download that to get a classification of takeaways, while deprivation is easy calculated from the Index of Multiple Deprivation. So the problem seemed relatively simple given my limited time.

Speaking of time, how long did your visualisation take?

I didn’t have long enough, the world cup, TC Europe and several camping holidays meant that I really dedicated just the last day of the time allowed to this. I started about 3pm and, with several breaks to see to the family and eat, I was working til 3am.

I wouldn’t recommend this approach, it meant I had very little time to iterate on the visualisation, I had no time to get feedback from any peers and very little time to step back and consider what I was doing.

What took the longest time?

I settled on the data and story fairly quickly, using Alteryx to pull the data together. However the design was something that I hadn’t worked out before starting, and well over half the time was spent on trying to come up with ideas.

I started off the idea to put the article on a newspaper, partially covered by fish and chips (that’s how we eat them from takeaways traditionally in the UK); there were however several difficulties. First and foremost, I need any design to use images I have created or that are free to use. Finding an image of fish and chips at the right angle and with a transparent background was hard with no copyright was hard, also I wanted to have the article crease like it was folded which would have been quite a bit of work.

I quickly returned to the drawing board with very few ideas as to how to approach my visualisation design. I’d wasted 2 or 3 hours looking for images and I needed something quick. In desperation I googled “Takeaway” to look for related images and that’s where the takeaway menu hit me – and the idea was born.

The design looks quick complex – what software do you use?

I actually have a licence of photoshop I use for photography but I’m not a very good user, I can understand layers and some elements so I used that to piece together the design.

Wait, Photoshop? and you use Alteryx? Other people don’t have those advantages?

No, but let’s be clear I only use them because I have licences and have put effort in to learn them. All the work I did in Photoshop I could have done in Gimp, SnagIt Editor or even Paint. Likewise the data prep work could have been done in Tableau Prep (aside from joining on the spatial files which could have been done in Tableau) or I could have used other free software like EasyMorph.

So back to the design, where did the actual Takeaway design come from?

I copied took inspiration from this design from a menu designer, I then played around with the sizes and colours until I was happy.

What blind alleys did you go down in producing your Iron Viz?

Lots! Isn’t that what Iron Viz is all about. I really wanted to add an extra geographic element to the visualisation, and look at the relationship at perhaps 1km grid level. I did the analysis but the relationship I wanted to see just wasn’t there due to geographic anomalies i.e. town centres have a lot of shops but not many people. I tried extending the analysis out to 3km or 3 miles but there was too much noise in the data, remote areas were completely distorting the story and there were no patterns I could. In the end I settled for the simple analysis.

What was the hardest part did you find?

Having done so many Data Beats projects lately, I found it incredibly hard to limit myself to a single dashboard. I’ve got so used to using words to tell my story and explaining it over several paragraphs with visualisations to help me a long the way then this was incredibly frustrating – I had so much to say but not enough space to say it.

You said in the last Feeder you were too intimidated by the competition to enter. What changed your mind to enter?

I regret not entering the first feeder. My thought process came from my competitiveness – I really want to win this thing and I feel the competition is such that I might not be able to. Coupled with the fact the time has increased to a full month I really struggled to create enough time to compete with some of the stronger entries. Before a few hours was enough to compete, now it’s not even close.

But my thought process was wrong, trying to win Iron Viz is like trying to win the London Marathon – it takes hours and hours of practice and training in the build-up to get even close. Does that mean it’s not worth it? No. The fun is in the taking part – I’d encourage everyone to take part and just give it a go. It’s a fun project and something that only comes around 3 times a year.

What about the other entries, any favourites?

For analysis I have to choose my good friend Rob Radburn’s: Who Care’s. Rob has an instantly recognisable style and his commentary and analysis really shine in this piece.

For Story-Telling I’d say Mike Cisneros’: Last Words is just beautiful. Mike pulls together visualisations that might be just “show me the numbers” but binds them with stories of last letters home which just break the heart.

For Design Curtis Harris’: If I was Kevin Durant wins the day for me, it’s just a beautiful piece of  work, not over reliant on imagery – just all about the data.

There are lots more I could pick out but these are some of my favourites.

Those are all amazing pieces of work, thanks for sharing Chris and good luck with Iron Viz.

 

 

 

 

 

 

Advertisements

What happens at The End?

This is a post is a small thought about one piece of Data Visualisation best-practice.

“Jeez Chris, get a life”. Yes, I know, here I am again. Get over it. I happen to think this stuff is quite important.

This is a reply to and critique of the chart made by Jeff Plattner and recreated by Andy Kriebel in his blog post Makeover Monday: Restricted Dietary Requirements Around the World | Dot Plot Edition

I’d originally fed back on best practice to Jeff on Twitter but given Andy has chosen to recreate the chart and has a huge audience for his blog I felt it was worth a blog post in reply to point out, what I think, is a small best practice failings in the chart.

Let’s compare the three charts below, the last being Jeff’s original:

Bars

Bars – Andy’s Initial Makeover

Lollipop

A Lollipop Chart based on Andy’s Makeover of Jeff’s

Dot Plot

Jeff’s Original

I love Jeff’s take on this subject. I immediately fell in love with the design and loved the “slider ” plot which I’d not seen used so effectively.

However there is a subtle difference between this last chart and the two above.

All three have axis that shows the range of the bars / lollipops / sliders at 50%. This is a design choice which both Andy and Jeff (for the first and third charts) both said came from wishing to make the chart look better.

Chart

Now here comes the rub for me. In the first two the shrinking of the axis doesn’t take away from the audiences understanding of the chart. However in the last “slider” chart it does. Why? Because the chart has an end.

Why “The End” matters…

The end of a visualisation mark / chart is important for me, because if it exists then it implies something to the reader. It implies that

a. the data has a limit

b. you know where the limit is and can define it

c. you have ended the chart at the same place as the limit of the data

Let’s look at the three aspects here with our data

a. ✔ the limit of the data is 100%

b. ✔  no region can be more than 100% of a given diet

c. ✖  the line ends at 50% in Jeff’s chart

Why doesn’t this matter for the first two charts? Well these two charts don’t have a limit set by the chart. Yes, the bar and lollipops end, but we’re forced to look elsewhere to see the scale. With the “slider” chart, then in my opinion, the reader feels safe to assume that a dot half-way along a line means that half the people in that area follow the diet. They don’t go further to look for the scale – despite the fact Jeff has clearly marked the limits.

This perceptual difference between the charts is important for me, and a good reason not to limit the axis at any other value than 100%, as I have done below by remaking Andy’s Remake. Dot Plot 100%

It this the biggest faux paux in the history of Data Visualisation? On a scale from 0 to 3D Exploding Pie Chart then we’re at 0.05. So no, not really, but I thought it was interesting enough to share my thoughts on what was an excellent Viz by Jeff.

As ever these are only my thoughts, they’re subjective, and many of the experts may not agree. Let me know what you think. Is the visual improvement worth the sacrifice of best practice?

Comment here or tweet me @ChrisLuv

Spring’s Got Speed

I must admit when Iron Viz was announced for this qualifier I had mixed feelings. On a positive side I know nature and animals, as a photographer and bird watcher I’ve spent a long time gathering data on animals. I even run a website dedicated to sightings in my local area (sadly it’s been neglected for years but still gets used occasionally). On the negative side though I knew I would be faced with competition from a bunch of amazing looking visualisations, using striking imagery and engaging stories.

Did I want to compete against that? With precious time for vizzing lately (especially at end of quarter – you can tell the Tableau Public don’t work in the sales team!) I only wanted to participate in Iron Viz if I could be competitive, and for those who don’t know me I like to be competitive….

So, as you perhaps guessed, my competitive edge won and I pulled some late hours and risked the wrath of my family to get something that did the competition justice.

A Note On Visualisations and creating a Brand

I’ve noted above I expected a lot of pictures and text from people in this qualifier, after all Jonni Walker has created his own brand around animal visualisations, stock photography and black backgrounds. However I have my style of visualisation,  I’m not Jonni Walker, what he does is amazing but there’s only place for so many “Jonni Walker” vizzes. I couldn’t replicate what he does if I tried.

In the past I’ve deliberately combined analytics and design, treading the fine line between best practice and metaphor, staying away from photograph and external embellishments and preferring my visualisations to speak for themselves through their colours and data. The subject this time was tricky though…was it possible to produce an animal visualisation without pictures?

Finding a Subject

I could turn section this into a blog post on it’s own! . I trawled the internet for data and subjects over days and days. Some of the potential subjects :

  • Homing Pigeons (did you know their sense of smell affects their direction)
  • Poo (size of animal to size of poo) – this was my boys’ favourite
  • Eggs (had I found this data I’d have been gazumped: http://vis.sciencemag.org/eggs/)
  • Zebra Migration
  • Sightings Data of Butterflies

Literally I couldn’t find any data to do these enough justice, I was also verging on writing scientific papers at points. I was running out of ideas when I found this website: http://www.naturescalendar.org.uk/ – and I dropped them an email to see if I could use their data. Phenology (studying natures cycles) has always interested me and getting my hands on the data would be fantastic. There was even a tantalising  mention of “measuring the speed of spring” on their news site with some numbers attached but no mention of the methodology….

Now, I’m impatient and so….using a few “dark art” techniques I didn’t wait for a reply and scraped a lot of the data out of their flash map using a combination of tools including Alteryx.

Thankfully a few days later they came back and said I’d be able to use it (after a short discussion) and so all ended well.

Measuring the Speed of Spring

Now I had the data working out how to measure my own “speed of spring” was difficult. Several options presented themselves but all had their drawbacks…the data is crowd-sourced from the public, mainly by people who can be trusted but amateur outliers could affect the result (do you want to say Spring has hit Scotland based on one result). Also the pure number of recorders in the South East, say, could affect any analysis, as could the lack of them in say, Scotland. Given we’re expecting to see Spring move from South to North then that could seriously sway results.

In the end I played between two methods:

  1. A moving average of the centroid of any sightings – and tracking it’s rate of movement
  2. A more complex method involving drawing rings round each sighting and then tracking the overall spread of clusters of sightings across the UK.

In the end I opted for the latter method as the former was really too likely weighted by the numbers of sightings in the south.

Very briefly I’ll outline my methodology built in Alteryx.

  1. Split the country into 10 mile grids and assign sightings to these based on location
  2. Taking each grid position calculate the contribution to the surrounding grids within 50 miles based on a formula: 1-(Distance/50). Where Distance is the distance of the grid from the source grid.
  3. Calculate the overall “heat” (i.e. local and surrounding “adjusted” sightings) in each grid cell
  4. Group together cells based on tiling them into groups dependent on “heat”
  5. Draw polygons based on each set of tiles
  6. Keep the polygon closest to the “average” grouping i.e. ignoring outliers beyond the standard deviation

I then did the above algorithm for each week (assigning all sightings so far this year to the week) and for each event and species.

These polygons are what you see on the first screen in the visualisation and show the spread of the sightings. I picked out the more interesting spreads for the visualisation from the many species and events in the data.

Small Multi Map

The above process was all coded in Alteryx.

Alteryx

If you look closely there’s a blue dot which calls this batch process:

Alteryx Batch

which in turn calls the HeatMap macro. Phew, thank god for Alteryx!

Now to calculate the speed, well rate of change of area if you want to be pedantic! Simple Tableau lookups helped me here as I could export the area from Alteryx and then compare this weeks area to the last. The “overall speed” was then an average of all the weeks (taking artistic licence here but given the overall likely accuracy of the result this approximation was okay in my book).

Iterate, Feedback, Repeat

I won’t go into detail on all the ideas I had with this data for visualisation but the screenshots will show some of what I produced through countless evenings and nights.

 

“Good vizzes don’t happen by themselves they’re crowdsourced”

I couldn’t have produced this visualisation without the help of many. Special mentions go to:

Rob Radburn for endless Direct Messages, a train journey and lots of ideas and feedback.

Dave Kirk for his feedback in person with Elena Hristzova at the Alteryx event midweek and also for the endless DMs.

Lorna Eden put me on the right path when I was feeling lost on Friday night with a great idea about navigating from the front to the back page (I was going to layer each in sections).

Also everyone else mentioned in the credits in the viz for their messages on Twitter DM and via our internal company chat (I’m so lucky to have a great team of Tableau experts to call on as needed).

Difficulties

Getting things to disappear is a nightmare! Any Actions and Containers need to be in astrological alignment….

Concentrating on one story is hard – it took supreme will and effort to concentrate on just one aspect of this data

Size – I need so much space to tell the story, this viz kept expanding to fit it’s different elements. I hope the size fits most screens.

The Result

Click below to see the result:

Final.png

 

On Conference Etiquette and Poor Talks

We’re starting Conference season in my small corner of the data world, with Tableau and Alteryx conference happening simultaneously in London and Vegas respectively. Sadly I’m missing out on my first Alteryx Inspire in a number of years – I hope my friends in Vegas have an amazing time.

As these conferences draw near we’re always treated to an array of advice from seasoned attendees around how to get the most of your experience and so I wanted to add my opinion to this growing pile of tips and tricks. In doing so I want to challenge what seems to be accepted wisdom in conferences I attend among the many bloggers and tweets I follow, the advice goes something like this:

“If you’re not enjoying a talk then walk out and find something else – your time at conference is valuable”

Personally I think this is the worst advice you could be given. Not only is it rude, it also makes a bad situation worse. So let’s show you how to rescue those poor talks and turn them into a positive experience.

1. Choose your talks wisely

Take time to use the conference apps and schedules well in advance of the conference, take the time to research the speakers and topics. If you wish to learn something in particular, or already have some knowledge on the subject, then seek out opinions from peers or the speakers themselves, if you can reach them, on whether your attendance is worthwhile.

Sometimes it’s worth attending a talk not for the content itself but in order to connect with the person afterwards, particularly if they share a common interest or specialism or the same industry.

Whatever the reasons for attending the talk make sure you are clear on them before you walk through the door to attend. Ask yourself (if there are multiple sessions you wish to see) if there are ways to get the same outcome without attending, e.g. arranging to meet the speaker for a 1-to-1 (most speakers are only too flattered to be asked for a coffee to chat through their subject in detail) or watching again online. Try to choose the talk you’d like to ask questions in if the sessions are recorded.

In summary I’d perhaps go as far to say there’s no such thing as a bad session, only poorly chosen ones. You owe yourself, and the speaker, the duty to choose your session carefully.

2. Walking out won’t help

So, if you followed the above advice, you chose, quite deliberately, to come along to this talk. You know why you came and you know what you want to get out of it. You consider the speaker to have something interesting to say, otherwise you wouldn’t be here.

But now the talk isn’t going well. Perhaps the speaker has a voice that belongs on the shipping forecast more than a conference, or perhaps they’re having all sorts of technical problems – those perfect dashboards just won’t render on the conference screens – or maybe they’re nervous and can’t get their words out quite as they intended. Maybe they just didn’t have time to prepare. Maybe they’re reading out their slides to the audience! Whatever the reason walking out is likely to only make a bad situation worse.

Why? Firstly you now have to run across a large conference venue and, if you’re lucky, join your well researched second choice talk rather late. More probably you didn’t have a second choice and so you just run into the nearest room, or your second choice is full and you can’t get in. You might even be forced to just grab a coffee and play pinball. Whatever happens you won’t have the clear outcomes you wanted from your primary choice – and so you’re likely to not find it as valuable (not least because you missed some).

More importantly though what happened to all those reasons for attending the first talk? Did they go away? Of course not, so you’re giving up on a massive opportunity to rescue your original mission.

auditorium, chairs, comfortable

3. Just be Polite

As a speaker I have to say there’s nothing more off-putting than seeing people leave. At conference, in a large venue then really it is to be expected, but many speakers at our data conferences aren’t professional speakers and they’re in relatively small rooms. They’ve given up there time to prepare a talk (which take a lot of effort – more than 99% of attendees have done). The least effort you could put in, having decided to attend, is make a small commitment of all 40-50 minutes of your time.

So make sure you’ve been to the bathroom, make sure you listen and engage with the speaker, try and avoid WhatsApp conversations moaning about the speaker to your friends in other sessions, avoid Facebook for an hour, because you can use the opportunity to potentially turn what could be a wasted 40-50 minutes back into a great learning opportunity.

If you do think you’re prone to voting with your feet then please sit by the door and try to avoid a minimal fuss as you leave. Also remember doors can slam in conference halls – so close doors behind you.

4. Rescuing the Situation

Yes, poor talks happen, as we’ve said, for a variety of reasons, but assuming you’ve decided to stick around then you can rescue the situation and still achieve your original objective for attending the talk.

How do you rescue the situation?

  • Be patient – speakers, particularly customer speakers, are often nervous and so they’ll take a while to loosen up.
  • Think of questions – focus on what the speaker isn’t saying, that’s often the more interesting stuff. How does that tie in with what you wanted to get out the session? Write down a set of questions as the speaker goes through?
  • Ask questions at the end – new speakers will more often than not under-run, leaving plenty of time for questions. This is your chance to really get what you need to know. Tie questions back to what the speaker was saying to show you were listening and ask them to expand on areas of interest to you. Often getting a speaker ad-libbing about something they feel passionate about is where you’ll really start to learn something.
  • Approach the speaker at the end of the talk – as the room empties make sure to say Hi. You could even offer to buy them a coffee if you still haven’t got what you wanted from the talk. Remember you chose this person as an expert in a field you were interested in, one bad presentation doesn’t mean they don’t have something interesting to say.

Prepare well and remember your objectives

In conclusion you owe yourself the duty of preparing well for the talks you want to attend, that preparation will help you focus on what you want to achieve and help you through any sessions that don’t live up to your expectations.

Walking out and leaving poor survey feedback isn’t your only choice, in fact it is likely to be the worst choice you can make. Make the most of the experts the conferences lay on for you and enjoy yourself.

 

 

Data Visualisation: Lonely Hearts Club

2017-03-03_14-00-37
My data visualisation life outside work is missing something. I’m lonely. The hours I spend hunched over the PC visualising data remain unfulfilling. When I’m not “vizzing” the rest of my time is spent on social media networks with other single vizzers. We all pretend we’re happy being single, but deep down I know many of us aren’t. I think it’s important to talk about the loneliness.

You see I’ve spent years now without an audience. At first it was fun, I had the freedom to do what I wanted when I wanted; I didn’t have to worry about pleasing the other half. I spent so many weekends on the equivalent of a boys night out, visualising random datasets, where I splurged out having fun and not really caring about the consequences. Usually I was in the company of lots of other singles and we had a blast. I even had a few meaningless relationships out of those nights, I hope they prepared me for what it’s like to be in a real relationship but I worry they taught me bad habits. After all those nights were all about impressing my mates, not my prospective partner, and so while the results were impressive I’m not sure either of us got any long term value out of the fling.

pexels-photo-247839

Having an audience, so we’re told, is the norm. Articles everywhere tell us how to keep our audience when we’ve found her, but there’s never any clue in them about how to find one in the first place. “Know your audience” everyone says, and every time I hear that a little piece of me dies because I know so many people who don’t have one.

A life in Data Visualisation without an audience is hard. I try my best but I end up vomiting data points and facts onto a page in attempt to make something meaningful. I make them engaging, I add pictures and I try to piece a story together but if I’m honest it’s nothing more than a bit of data porn. Something I know my fellow singles will find entertaining, briefly, but that will be quickly binned as they click on looking for something a little bit more hardcore.

Recently I’ve been attending a few singles nights with the aim of finding a long term partner / audience. Last weekend I was at #OpenDataCamp where I made an appeal for an audience, a user, someone, anyone who I could work with to help solve real issues with visualisation. Yes I know they’d give me problems and challenges but I want to do something meaningful; I think I’m ready for some commitment. Maybe I came across desperate because no one was interested. It was fun, I met plenty of people looking for the same thing as me from a slightly different angle, they had the data but also no audience…some even suggested if I found someone then they could join us in a threesome. I liked the sound of that but perhaps having three in the relationship will only complicate things more….

pexels-photo-165263

Ultimately I guess everyone wants to settle down like me but many of my older friends have settled into the single life as a permanent bachelor. Some of them I never hear from, it’s really sad to see people disappear because they couldn’t find an audience, I wonder where they go? Maybe they found one and never told me…. Others are happy telling others how to have productive relationships without having one themselves. Still others have taken themselves off the market, thrown themselves into work where they can have real relationships, again we don’t see them much anymore. Yes, some of the old timers still join us on boys nights out, but if I’m honest it’s a bit sad seeing them on nights out with the young crowd. I don’t want to be one of them, I want to have a meaningful relationship with someone I can commit to. Wven if it’s just short term I want it to be meaningful. I hope there’s still time, I think I have a lot to offer if I meet the right partner.

If you know anyone who can be my audience let me know, I’d love to meet one and try and work together to create something special.

 

Why we’re going to #opendatacamp

screenshot-2017-02-18-11-42-58On Saturday and Sunday fellow Tableau Zen Master, Rob Radburn and I will be attending Open Data Camp in Cardiff.

So why are we spending a Saturday and Sunday in Cardiff away from our families and spending a small fortune on hotels?

 
Well sometimes data visualisation can be frustrating. We’re both prominent members of the Tableau Community and we’ve spent countless hours producing visualisations for our own projects as well as community initiatives such as Makeover Monday and Iron Viz. There’s lots of fun and rewards for this work, both personally and professionally and so why is it frustrating? Well shouldn’t there be more to data visualisation than just producing a visualisations for consumption on Twitter? How do we do produce something meaningful and useful (long term) through data and visualisations?

Open Data seems a suitable answer however with so many data sets, potential questions and applications it’s hard to know where to start. The open data community have done a great job at securing access to many important datasets but I’ve seen little useful visualisation / applications of open datasets in the UK beyond a few key datasets. How do we do more?

tableau_logo_crop-jpg_resized_460_Tableau Public on the other hand has done a fantastic job of ensuring free access to data visualisation for all, but few in the community have worked with the open data community to enable the delivery of open data through the platform.

Rob and I are hoping that our pitch at Open Data Camp will facilitate a discussion around bridging the gap between the Tableau Community and the Open Data Community. On the one side we have a heap of engaged and talented data viz practitioners on Tableau Public looking for problems, on the other hand a ton of data with people screaming for help understanding it….on the face of it there seems some exciting possibilities, we just need to pick through the .

Oh and while we’re there if anyone wants us to pitch a Tableau Introduction and / or Intro to Data Visualisation we’d be happy to facilitate a discussion around that too.

Would love your thoughts

Chris and Rob

Using Inspect / Javascript to scrape data from visualisations online

My last post talked about making over this visualisation from The Guardian:

2016-11-13_12-55-29

What I haven’t explained is how I found the data. That is what I intend to outline in this post. Learning these skills is very useful if you need to find data for re-visualising data visualisations / tables found online.

The first step with trying to download data for any visualisation online is by looking checking how it is made, it may simply be a graphic (in which case it may be hard unless it is a chart you can unplot using WebPlotDigitiser) but in the case of interactive visualisations they are typically made with javascript unless they are using a bespoke product such as Tableau.

Assuming it is interactive then you can start to explore by using right-click on the image and choose Inspect (in Chrome, other browsers have similar developer tools).

2016-11-13_19-26-35

I was treated with this view:

2016-11-13_19-28-09.png

I don’t know much about coding but this looking like the view is being built by a series of paths. I wonder how it might be doing this? We can find out by digging deeper, let’s visit the Sources tab:

2016-11-13_19-31-30

Our job on this tab is to look for anything unusual outside the typical javascript libraries (you learn these by being curious and looking at lots of sites). The first file gay-rights-united-states looks suspect but as can be seen from the image above it is empty.

Scrolling down, see below, we find there is an embedded file / folder (flat.html) and in that is something new all.js and main.js….

2016-11-13_19-34-05

Investigating all.js reveals nothing much but main.js shows us something very interesting on line 8. JACKPOT! A google sheet containing the full dataset.

2016-11-13_19-38-25

And we can start vizzing! (btw I transposed this for my visualisation to get a column per right).

Advanced Interrogation using Javascript

Now part way through my visualisation I realised I needed to show the text items the Guardian had on their site but these weren’t included in the dataset.

2016-11-13_19-41-27

I decided to check the javascript code to see where this was created to see if I could decipher it, looking through main.js I found this snippet:

function populateHoverBox (type, position){

 var overviewObj = {
 'state' : stateData[position].state
 }
.....
if(stateData[position]['marriage'] != ''){
 overviewObj.marriage = 'key-marriage'
 overviewObj.marriagetext = 'Allows same-sex marriage.'
 } else if(stateData[position]['union'] != '' && stateData[position]['marriageban'] != ''){
 overviewObj.marriage = 'key-marriage-ban'
 overviewObj.marriagetext = 'Allows civil unions; does not allow same-sex marriage.'
 } else if(stateData[position]['union'] != '' ){
 overviewObj.marriage = 'key-union'
 overviewObj.marriagetext = 'Allows civil unions.'
 } else if(stateData[position]['dpartnership'] != '' && stateData[position]['marriageban'] != ''){
 overviewObj.marriage = 'key-marriage-ban'
 overviewObj.marriagetext = 'Allows domestic partnerships; does not allow same-sex marriage.'
 } else if(stateData[position]['dpartnership'] != ''){
 overviewObj.marriage = 'key-union'
 overviewObj.marriagetext = 'Allows domestic partnerships.'
 } else if (stateData[position]['marriageban'] != ''){
 overviewObj.marriage = 'key-ban'
 overviewObj.marriagetext = 'Same-sex marriage is illegal or banned.'
 } else {
 overviewObj.marriagetext = 'No action taken.'
 overviewObj.marriage = 'key-none'
 }

…and it continued for another 100 odd lines of code. This wasn’t going to be as easy as I hoped. Any other options? Well what if I could extract the contents of the overviewObj. Could I write this out to a file?

I tried a “Watch” using the develop tools but the variable went out of scope each time I hovered, so that wouldn’t be useful. I’d therefore try saving the flat.html locally and try outputting a file with the contents to my local drive….

As I say I’m no coder (but perhaps more comfortable than some) and so I googled (and googled) and eventually stumbled on this post

http://stackoverflow.com/questions/16376161/javascript-set-file-in-download

I therefore added the function to my local main.js and added a line in the populateHoverBox function….okay so maybe I can code a tiny bit….

var str = JSON.stringify(overviewObj);
 
download(str, stateData[position].state + '.txt', 'text/plain');

In theory this should serialise the overviewObj to a string (according to google!) and then download the resulting data to a file called <State>.txt

Now for the test…..

downloadingfiles

BOOM, BOOM and BOOM again!

Each file is a JSON file

2016-11-13_20-07-21

Now to copy the files out from the downloads folder, remove any duplicates, and combine using Alteryx.

2016-11-13_20-04-59

As you can see using the wildcard input of the resulting json file and a transpose was simple.

2016-11-13_20-08-31

Finally to combine with the google sheet (called “Extract” below) and the hexmap data (Sheet 1) in Tableau…..

2016-11-13_20-09-41

Not the most straightforward data extract I’ve done but I thought it was useful blogging about so others could see that extracting data from visualisation online is possible.

You can see the resulting visualisation my previous post.

Conclusion

No one taught me this method, and I have never been taught how to code. The techniques described here are simply the result of continuous curiosity and exploration of how interactive tables and visualisations are built.

I have used similar techniques in other places to extract data visualisations, but no two methods are the same, nor can a generic tutorial be written. Simply have curiosity and patience and explore everything.