Tag Archives: google docs

How to make data look good: Food Standards Agency Ratings

Food hygiene ratings are important. They tell you how good somewhere is at keeping things clean – and that matters.

I’ve done this for the Daily Post before, and explained scraping in that post, but for North Wales only. This map is a bit more advanced than what I’ve previously done. It contains two layer elements, and fuses individual businesses data with a county overview to create a national picture.

This is a snapshot from May 19, 2012, and data is accurate as it is. Until I can use scraperwiki to scrape live data into a Google Fusion Table, snapshots are the best that can be done.

The counties are coloured to represent the percentage of businesses rated between 0 and 2. The lower the percentage, the more green they are. The higher the percentage, the more red they are. The businesses are coloured red for 0, yellow for 1 and blue for 2.

The info boxes tell you details about the county or the business.

How did I do this?

Well, to start with, data journalism is basically journalism flipped on its head. In normal journalism, you can end up with a front page scoop or a huge feature from just a simple line, and you do the investigation to build the story. But with data journalism, you get lots of stats and information, and have to investigate by whittling it down to just what you need or in a way you can make sense of it, and tell the story. This is explained a lot more clearly by one of my tutors at City University, London on this – Paul Bradshaw. His inverted pyramid is fantastic.

 

Inverted data journalism pyramid by Paul Bradshaw

So, this data. To start with, I had one huge dataset. The food standards agency database. This contains all the data about businesses, including where they are, with addresses, and ratings. I had to mash this up with latitude and longitude data, and overall data, which i gathered from the database.

The businesses and addresses, their ratings and when they were last inspected, was scraped using this scraperwiki code from ratings.food.gov.uk. I changed it do to this 22 times, once for each county.

I merged the databases, scraped on May 19th overnight, into one single Excel file, and added the relevant county name after each. Then, I sorted them by rating, and added a column, in which I put the code for the Google Fusion Tables icon name after each different rating.

After that, I made a copy of the table as a second excel file, and used pivot tables to find out overall data. I made a data table which brought out how many were rated 0, 1 and 2 in each county. I got from the food standards agency website the total amount of businesses in an area, and then used excel formulas to work out the percentage rated 0-2 and the percentage rated 0 for each county, and then the average for Wales. From this, I was also able to work out the percentage rated between 3 and 5. I then used Info Base Cymru to get spending data per head for the last financial year (2010/11) for each of the counties on environmental services, the total spend per head overall, and then worked environmental spend as a percentage of total spend.

My final excel document looked a bit like this:

Excel chart

I then uploaded the first document to Google Refine. First, I removed the space in the postcode column by using Google Refine Expression Language (GREL) value.replace(“ “,””) to make another column, and used value.parseJson to pull in postcode data from uk-postcodes.com API, and then pull the latitude and longitude data for all the businesses from that postcodes data.

I then downloaded the document which resulted, and put it into Google Fusion Tables, and geocoded using two column location with latitude and longitude to get a more precise location, and edited the info box window to make the information more readable to viewers with this code:

<font size=”+1″>This is <b>{name}</b> in <b>{County}</b>.<br></font>
<br>
They were rated as <b>{rating}</b> on {date}<br>
<br>
<font size=”-1″>This was correct on May 19, 2012. To find out more, search for them on <a href=”http://ratings.food.gov.uk” target=”_blank”>the FSA ratings site.</a>

At the same as all this was being done – mainly, whilst waiting for APIs and scrapers to work – I created a map of KML layers of the different counties in Google Earth, using data from My Society Mapit, and sourced Pembrokeshire data separately as the file was broken. I uploaded that to Fusion Tables as a KML.

I then uploaded the second table of overall data to merge with the counties map, and edited the info box code to show me what I wanted it to show:

This is <b>{Council}.</b> They have <b>{Total premises}</b> premises in the county which have been rated.<br>

<br>

They have <b>{Total 0}</b> rated 0, <b>{Total 1}</b> rated as 1, and <b>{Total 2}</b> rated as 2.<br>

<br>

That means <b>{Percentage of premises rated 0-2}%</b> of the premises are rated 0-2.<br>

<br>

In 2010/11, they spent <b>£{Gross spend on local environmental services per head}</b> on environmental services, per head. It was <b>{Environment spend as a % of total expenditure}%</b> of their total spend per head.

I coloured it using the percentage of businesses in an area with 0-2 ratings. Green was the lower percentage, changing to white and then moving to red for the higher percentage.

I then used a Google Fusion Table builder wizard to merge the two tables and create a map containing the counties coloured to represent the percentage of how many 0-2 rated businesses there are, with the businesses, coloured by rating, on top, and a select tool to filter by 0, 1 and 2 rated businesses. That is the map we see above.

I also used the experimental feature to make a Map Chart, which allows you to click and compare up to four counties on defined metrics.

The metrics are the total premises rated 0-2 in that area, the percentage of premises rated 0-2 in that area, gross spend on local environmental services per head and environment spend as a % of total expenditure. This can compare up to four different areas at a time, useful for people comparing where to go on holiday or visit.

And the reason I was so happy on social media that this was able to be shown? Well, the Google Fusion Table wizard writes HTML code that won’t embed into WordPress, but will into Drupal (which I don’t use). However, I followed a quick guide on Paul Bradshaw’s Online Journalism Blog, and made the code a separate HTML file. A bit of fiddling about later, and this is now on Dropbox (it should be my server, but oh well), and it is embedded on here as an iframe. Now, anything I need to show, I can.

And now, for submission on Friday, I need to write the story. Suffice to say, it’s also pretty good data journalism in my book. All the data is open, but I did the research.

Election Night Results – How to make it go smoothly using Google Docs

Election nights are always fun. They last a long time (longer, indeed, than you think time can last). The candidates are always looking for their piles. The agents with the clipboards are ensuring democracy is done right on their candidate alone. You, meanwhile, are just tired that you’ve been waiting around all night, and there is another recount.

And that’s just if you’re at the count. At the office, you you’ll need coffee to stay awake, so maths is the least of your worries. So help inputting results wouldn’t go amiss.

I (hopefully) have created a bulletproof way of inputting results for the Daily Post, with a little help from my friends. They gave me a few pointers with formulas. I did the rest.

North Wales has five counties up for election. 243 wards. Over 700 people being elected. That’s a lot of numbers. So, how do we input them?

First, I split all the counties up. Five Google Doc spreadsheets. This way, it doesn’t get too confusing. All the wards are in columns, with a space next to it to input the result. Scroll across, input result. Easy.

Then, we need to know who won. So, seeing as the most votes wins, you pull the result down to another cell. Not easy, when numbers are all over the place. So, I needed a formula. With a bit of help from @PaulBradshaw, a Vlookup formula would work. It would pull the top candidate down. Great, for most of them. It, however, would require the results to be inputted on the left.

So, @aendrew pointed me towards an idea that looks amazing. I decided to use a =large formula to rate the candidates. =large(c2:c12,1) would pull the largest number (for example) from that range down to the specified cell where the formula was entered. Great. You can add a 2, 3, or anything up to however many numbers there. You want the 50th in a series of 100? Just select the range, and type 50 at the end.

 

Now, we need a name. In this case, =index(B2:B12,match(B15,C2:C12, 0)). This would take the value in cell B15 – where the largest number is – and match it with the number in cells C2 to C12, where the results were inputted. It will find the exact match, and then look across at the cell next to it, in B2 to B12, and bring the candidate name down to cell B14, where the formula was entered.

Now, we have the candidate name on one cell, and the total winning votes on the cell directly below. Repeat that if two or three, or four, councillors are being elected. Easily done.

But, you can’t do anything with that data. Yes, you can’t. So, you need to flip it. Looking at that way is easier to input, which is the aim of this spreadsheet. Looking at it another way is easier to collate with other data sources. So, we need another formula.

=transpose(’1: ENTER RESULTS HERE’!A13:BX22)

This will take everything in the cell ranges A13:BX22 (where the results will be sorted and brought down to by the other formulas), move it to another sheet, and transpose them. That is, flip them around. You end up, that way, with columns of numbers that are easier to work with when making maps, and results. Nothing needs to doing to this, apart from selecting all the data, copying, and pasting as values only to the next sheet. Taking the formulas will ruin it, once you see what we’re going to do.

On the third sheet, we will sort the name column, where the ward names are, in A-Z format. Get rid of the spaces with the extra #num rows, and copy to the next workbook, called Conwy Results TO MERGE. So called, because you need a clean workbook to merge with Fusion Tables, which is exactly what this will do.

You put the data only into the workbook, which already has the county, in this case Conwy, repeated down the side many times, and the headings. These are uniform for a reason.

This table just needs the data copying to, then closing. Open Fusion Tables, make a new table from Google Spreadsheet, select the relevant TO MERGE sheet, and then merge that with the KML file of the county.

What we’re going to do is use these headings to write the info box code, so it works for 1 person elected or 4. The code is:

<div class=’googft-info-window’ style=’font-family: sans-serif’>
<font size=”+1″>This is <b>{Ward Name}</b> in <b>{County}</b></font><br>
Winning this ward was:<br>
<b>{Winner 1}</b> ({Votes 1})<br>
<b>{Winner 2}</b> ({Votes 2})<br>
<b>{Winner 3}</b> ({Votes 3})<br>
<b>{Winner 4}</b> ({Votes 4})<br>

This doesn’t leave any marks on it through typing commas. It works by using the cells as the text. If there’s nothing in the cells, nothing will appear in the info box. Clean, and at the same time, it makes nice prose on the top: This is “Ward Name” in “County” will read, for one ward at least: “This is Gogarth in Conwy.”

The map will then have all the details you need to see how the county voted on May 3rd.

I will update with the winning maps – suffice to say, this is the way to do work if you know you’re going to get data, but don’t know the figures. Easy to just drop it in, then spend 5 minutes copying, pasting, and merging. All the hard work has been done beforehand.

Data Journalism in Local News!

These are just three of five clippings from the Daily Post showing how data journalism makes great local news stories and helps to inform an area… the size of North Wales. (Everything is given a comparison as …the size of Wales.

So, Ambulance figures. It helps to use data journalism to see where they are – Google Docs works well. Full blog post coming soon.

Then, how to find out how many people have been taken off a mountain, when, why, etc, over 40 years. Full blog post here:

And my best work so far – All the prospective councillors for North Wales, mapped in an interactive format. Also, the data contributed to Wales Online. Full blog post soon.

I will write about these very soon, and explain them all. Enjoy the pages here though.

Of course, all copyright for the pages belongs to Trinity Mirror.