Food hygiene ratings are important. They tell you how good somewhere is at keeping things clean – and that matters.
I’ve done this for the Daily Post before, and explained scraping in that post, but for North Wales only. This map is a bit more advanced than what I’ve previously done. It contains two layer elements, and fuses individual businesses data with a county overview to create a national picture.
This is a snapshot from May 19, 2012, and data is accurate as it is. Until I can use scraperwiki to scrape live data into a Google Fusion Table, snapshots are the best that can be done.
The counties are coloured to represent the percentage of businesses rated between 0 and 2. The lower the percentage, the more green they are. The higher the percentage, the more red they are. The businesses are coloured red for 0, yellow for 1 and blue for 2.
The info boxes tell you details about the county or the business.
How did I do this?
Well, to start with, data journalism is basically journalism flipped on its head. In normal journalism, you can end up with a front page scoop or a huge feature from just a simple line, and you do the investigation to build the story. But with data journalism, you get lots of stats and information, and have to investigate by whittling it down to just what you need or in a way you can make sense of it, and tell the story. This is explained a lot more clearly by one of my tutors at City University, London on this – Paul Bradshaw. His inverted pyramid is fantastic.

Inverted data journalism pyramid by Paul Bradshaw
So, this data. To start with, I had one huge dataset. The food standards agency database. This contains all the data about businesses, including where they are, with addresses, and ratings. I had to mash this up with latitude and longitude data, and overall data, which i gathered from the database.
The businesses and addresses, their ratings and when they were last inspected, was scraped using this scraperwiki code from ratings.food.gov.uk. I changed it do to this 22 times, once for each county.
I merged the databases, scraped on May 19th overnight, into one single Excel file, and added the relevant county name after each. Then, I sorted them by rating, and added a column, in which I put the code for the Google Fusion Tables icon name after each different rating.
After that, I made a copy of the table as a second excel file, and used pivot tables to find out overall data. I made a data table which brought out how many were rated 0, 1 and 2 in each county. I got from the food standards agency website the total amount of businesses in an area, and then used excel formulas to work out the percentage rated 0-2 and the percentage rated 0 for each county, and then the average for Wales. From this, I was also able to work out the percentage rated between 3 and 5. I then used Info Base Cymru to get spending data per head for the last financial year (2010/11) for each of the counties on environmental services, the total spend per head overall, and then worked environmental spend as a percentage of total spend.
My final excel document looked a bit like this:

Excel chart
I then uploaded the first document to Google Refine. First, I removed the space in the postcode column by using Google Refine Expression Language (GREL) value.replace(“ “,””) to make another column, and used value.parseJson to pull in postcode data from uk-postcodes.com API, and then pull the latitude and longitude data for all the businesses from that postcodes data.
I then downloaded the document which resulted, and put it into Google Fusion Tables, and geocoded using two column location with latitude and longitude to get a more precise location, and edited the info box window to make the information more readable to viewers with this code:
<font size=”+1″>This is <b>{name}</b> in <b>{County}</b>.<br></font>
<br>
They were rated as <b>{rating}</b> on {date}<br>
<br>
<font size=”-1″>This was correct on May 19, 2012. To find out more, search for them on <a href=”http://ratings.food.gov.uk” target=”_blank”>the FSA ratings site.</a>
At the same as all this was being done – mainly, whilst waiting for APIs and scrapers to work – I created a map of KML layers of the different counties in Google Earth, using data from My Society Mapit, and sourced Pembrokeshire data separately as the file was broken. I uploaded that to Fusion Tables as a KML.
I then uploaded the second table of overall data to merge with the counties map, and edited the info box code to show me what I wanted it to show:
This is <b>{Council}.</b> They have <b>{Total premises}</b> premises in the county which have been rated.<br>
<br>
They have <b>{Total 0}</b> rated 0, <b>{Total 1}</b> rated as 1, and <b>{Total 2}</b> rated as 2.<br>
<br>
That means <b>{Percentage of premises rated 0-2}%</b> of the premises are rated 0-2.<br>
<br>
In 2010/11, they spent <b>£{Gross spend on local environmental services per head}</b> on environmental services, per head. It was <b>{Environment spend as a % of total expenditure}%</b> of their total spend per head.
I coloured it using the percentage of businesses in an area with 0-2 ratings. Green was the lower percentage, changing to white and then moving to red for the higher percentage.
I then used a Google Fusion Table builder wizard to merge the two tables and create a map containing the counties coloured to represent the percentage of how many 0-2 rated businesses there are, with the businesses, coloured by rating, on top, and a select tool to filter by 0, 1 and 2 rated businesses. That is the map we see above.
I also used the experimental feature to make a Map Chart, which allows you to click and compare up to four counties on defined metrics.
The metrics are the total premises rated 0-2 in that area, the percentage of premises rated 0-2 in that area, gross spend on local environmental services per head and environment spend as a % of total expenditure. This can compare up to four different areas at a time, useful for people comparing where to go on holiday or visit.
And the reason I was so happy on social media that this was able to be shown? Well, the Google Fusion Table wizard writes HTML code that won’t embed into WordPress, but will into Drupal (which I don’t use). However, I followed a quick guide on Paul Bradshaw’s Online Journalism Blog, and made the code a separate HTML file. A bit of fiddling about later, and this is now on Dropbox (it should be my server, but oh well), and it is embedded on here as an iframe. Now, anything I need to show, I can.
And now, for submission on Friday, I need to write the story. Suffice to say, it’s also pretty good data journalism in my book. All the data is open, but I did the research.