Exploring Food Hygiene Ratings


I’ve been exploring open data sets recently and one that I came across was more interesting than most, perhaps it’s the number of takeaways I seem to eat! The Food Standards Agency in the UK publish their ratings for every establishment in the UK, and this data is available as an API or XML data set for anyone to download and analyse via their website. So I set about doing just that.

First problem, how to get the data into a usable format, I wanted to analyse the whole country so opted against the API and went for the XML data. The data is split by Local Authority with a separate link for each, so I took the following steps to download it (which I’ve done before with other sets of data).

1. Copy the source HTML with the links (only the relevant section)

2. Paste it into a Text Input tool in Alteryx

3. Use some formula to clean the data and then use a Download Tool to download each URL to a separate XML file (there were 396 files in total – this would have taken a long time by hand)

4. In a new module load in the XML as an XML file, using the root element and then parse out using XML parse tools to get the data required into a flat table format.

5. Clean the data and blend to geographic data source (eg. to attach common geographies spatially – I had the lat / lon) and output as Tableau Data Extract.

One of the biggest issues in the data in Step 5 to clean was miscoded Lat \ Lon values, to cure these I decided to find the average Lat / Lon of all the points in each Local Authority, then step through the data using a multirow formula and remove any rows that were suddenly much further away than the previous row. This is one of the biggest advantages of using Alteryx prior to the Tableau Data Visualisation – you can quickly spot and remedy errors that would be difficult to find and fix in Tableau.

Once in Tableau I then set about creating some visualisations users could use to explore the data. One question I wanted to explore was whether Food Hygiene was linked to Deprivation, so I downloaded the English Index of Multiple Deprivation (2010) and blended this in Tableau, by Lower Super Output Area, to the Food Ratings data. The data was too messy generally so I broke down by analysis by looking at splits by Government Office Regions and Business Type. I’m hoping the resulting Viz doesn’t look too crowded, this is my first Tableau Public Visualisation so please let me know your thoughts – constructive critique is very welcome.

Unfortunately WordPress won’t let me embed the Tableau Public code, so click on the image below to go to a separate view:

click for interactive view

click for interactive view

For my other dashboards (accessed via tabs in the above) I wanted to give users the ability to see a local map of their area, easily done using actions and lists, and finally a view of the Best and Worst Local Authorities as far as Hygiene Ratings go. This latter view was where I started to have problems with doing what I wanted.

Firstly I needed a table calculation to produce a weighted indication of which how good each Local Authority was. To do this I wanted to get the percentage of each rating across the LA and then multiply each one by the a reverse weighted Rating Value (i.e. 6 – Rating) to give lower rating more impact on the final score. I then wanted to sum these ratings.

Using table calculations in Tableau is still a learning exercise but now I understand addressing and partitioning then it’s a lot easier. The “window_sum” formula helped me calculate the percentage across the whole LA. So far so good.

However what I wanted to do was sort on the resulting table calculation, not as easy as it sounds as the sort is performed after the calculation in Tableau. Finally, by using the following steps I managed to do what I wanted:

1. Make the Table Calculation Discrete

2. Drag it to the Rows Partition

3. Set the appropriate “calculate using” values – use advanced is my recommendation.

4. Hide the headers to hide the value in the view.

To do the “Worst” sort order  I needed to calculate a reverse of the original calculation, then do the same again.

Finally when it came to restricting the view to the Top 10 then I was flummoxed, I tried using a Scaffold Dataset and blending with the original but to no avail – any tips welcome. In the end I was tempted to move the calculations to the Alteryx module I used before table to do some “pre” work, but I decided to leave the worksheet as it was. Any tips on achieving this in Tableau gratefully received.

Thanks for reading.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s