Tag Archives: data

6 of the week and 1 from the past: 8/7/12

It’s an idea that many people do, and this far from original. It will explain a bit of my thinking and influences, though, and hopefully give you something else to read.

Every Sunday, I will post 6 links and 1 photo of things i’ve found interesting this week. If you follow me on Twitter then you’ll probably see the favourites when I click the star on Google Reader, retweets as I retweet them, or favourites if you post it.

So, for the week ending 8/7/12

1.How to make boring things interesting in video

Adam Westbrook explains his 6 steps to making a video interesting, which should be required reading for all video journalists or producers creating video for their websites.

2. Top tips: Shooting summer events

The Guardian takes a look at the 10 top tips they believe will help you get the most from shooting summer events. They include advice on thinking which kit to take, and checking the weather. Simple, but needed.

3. My first e-book: scraping for journalists (and programming too)

Paul Bradshaw taught me many things about data journalism when on the City University MA Interactive Journalism course, and here he is teaching everyone else. It’s released next week, so sign up, and read it if you think you can find a use for it in your newsroom (clue: you can)

4. New working papers launched: ‘Justice Wide Open’

An important series of papers about open justice were launched this week from City University. They say:

The new collection of working papers is part of a wider project encouraging ‘Open Justice in the Digital Era‘. The issues are extensive and diverse: the recommendations of the government’s ‘secret justice’ green paper, now the Justice & Security bill, which would see more cases behind closed doors; the decline in local and national court reporting as a result of cuts in journalism; the courts’ barriers to entry due to ill-informed staff; and the difficulties in obtaining free legal information.

In the first section on the tradition and context of open justice, Geoffrey Robertson QC, the keynote speaker at the event, sets out the history of the principle and argues that the government’s Justice and Security recommendations are simply not compatible.

The Master of the Rolls, Lord Neuberger, examines the way in which open justice is important and underpins our society amongst others.

5. Neal Mann & Simon Rogers on social media and datajournalism at Activate

Martin Belam, Lead UX & Information Architect at The Guardian, details what Neal Mann (@fieldproducer, who recently left Sky News) and Simon Rogers (Editor of the Guardian Datastore) think are the ways technology has changed how they work as journalists.

6. Meet data mapping platform CartoDB

Data Driven Journalism introduces us to new mapping tool, CartoDB. They say users “have more freedom to customise their data, for example they can choose how much data is displayed and how it is shown.”

It aims to give journalists the right tools to work with journalism every day – but with Google Fusion Tables already very easy to use, will it make a difference?

And the final part, one photograph from the archives. I use my archives on Flickr.com mainly, so browse there for interesting stuff I’ve taken in the past.

Madoc Street In The Rain

From 2008, but could as well as have been from this week.

**

I hope you enjoyed this weeks look back.

How to make data look good: Food Standards Agency Ratings

Food hygiene ratings are important. They tell you how good somewhere is at keeping things clean – and that matters.

I’ve done this for the Daily Post before, and explained scraping in that post, but for North Wales only. This map is a bit more advanced than what I’ve previously done. It contains two layer elements, and fuses individual businesses data with a county overview to create a national picture.

This is a snapshot from May 19, 2012, and data is accurate as it is. Until I can use scraperwiki to scrape live data into a Google Fusion Table, snapshots are the best that can be done.

The counties are coloured to represent the percentage of businesses rated between 0 and 2. The lower the percentage, the more green they are. The higher the percentage, the more red they are. The businesses are coloured red for 0, yellow for 1 and blue for 2.

The info boxes tell you details about the county or the business.

How did I do this?

Well, to start with, data journalism is basically journalism flipped on its head. In normal journalism, you can end up with a front page scoop or a huge feature from just a simple line, and you do the investigation to build the story. But with data journalism, you get lots of stats and information, and have to investigate by whittling it down to just what you need or in a way you can make sense of it, and tell the story. This is explained a lot more clearly by one of my tutors at City University, London on this – Paul Bradshaw. His inverted pyramid is fantastic.

 

Inverted data journalism pyramid by Paul Bradshaw

So, this data. To start with, I had one huge dataset. The food standards agency database. This contains all the data about businesses, including where they are, with addresses, and ratings. I had to mash this up with latitude and longitude data, and overall data, which i gathered from the database.

The businesses and addresses, their ratings and when they were last inspected, was scraped using this scraperwiki code from ratings.food.gov.uk. I changed it do to this 22 times, once for each county.

I merged the databases, scraped on May 19th overnight, into one single Excel file, and added the relevant county name after each. Then, I sorted them by rating, and added a column, in which I put the code for the Google Fusion Tables icon name after each different rating.

After that, I made a copy of the table as a second excel file, and used pivot tables to find out overall data. I made a data table which brought out how many were rated 0, 1 and 2 in each county. I got from the food standards agency website the total amount of businesses in an area, and then used excel formulas to work out the percentage rated 0-2 and the percentage rated 0 for each county, and then the average for Wales. From this, I was also able to work out the percentage rated between 3 and 5. I then used Info Base Cymru to get spending data per head for the last financial year (2010/11) for each of the counties on environmental services, the total spend per head overall, and then worked environmental spend as a percentage of total spend.

My final excel document looked a bit like this:

Excel chart

I then uploaded the first document to Google Refine. First, I removed the space in the postcode column by using Google Refine Expression Language (GREL) value.replace(“ “,””) to make another column, and used value.parseJson to pull in postcode data from uk-postcodes.com API, and then pull the latitude and longitude data for all the businesses from that postcodes data.

I then downloaded the document which resulted, and put it into Google Fusion Tables, and geocoded using two column location with latitude and longitude to get a more precise location, and edited the info box window to make the information more readable to viewers with this code:

<font size=”+1″>This is <b>{name}</b> in <b>{County}</b>.<br></font>
<br>
They were rated as <b>{rating}</b> on {date}<br>
<br>
<font size=”-1″>This was correct on May 19, 2012. To find out more, search for them on <a href=”http://ratings.food.gov.uk” target=”_blank”>the FSA ratings site.</a>

At the same as all this was being done – mainly, whilst waiting for APIs and scrapers to work – I created a map of KML layers of the different counties in Google Earth, using data from My Society Mapit, and sourced Pembrokeshire data separately as the file was broken. I uploaded that to Fusion Tables as a KML.

I then uploaded the second table of overall data to merge with the counties map, and edited the info box code to show me what I wanted it to show:

This is <b>{Council}.</b> They have <b>{Total premises}</b> premises in the county which have been rated.<br>

<br>

They have <b>{Total 0}</b> rated 0, <b>{Total 1}</b> rated as 1, and <b>{Total 2}</b> rated as 2.<br>

<br>

That means <b>{Percentage of premises rated 0-2}%</b> of the premises are rated 0-2.<br>

<br>

In 2010/11, they spent <b>£{Gross spend on local environmental services per head}</b> on environmental services, per head. It was <b>{Environment spend as a % of total expenditure}%</b> of their total spend per head.

I coloured it using the percentage of businesses in an area with 0-2 ratings. Green was the lower percentage, changing to white and then moving to red for the higher percentage.

I then used a Google Fusion Table builder wizard to merge the two tables and create a map containing the counties coloured to represent the percentage of how many 0-2 rated businesses there are, with the businesses, coloured by rating, on top, and a select tool to filter by 0, 1 and 2 rated businesses. That is the map we see above.

I also used the experimental feature to make a Map Chart, which allows you to click and compare up to four counties on defined metrics.

The metrics are the total premises rated 0-2 in that area, the percentage of premises rated 0-2 in that area, gross spend on local environmental services per head and environment spend as a % of total expenditure. This can compare up to four different areas at a time, useful for people comparing where to go on holiday or visit.

And the reason I was so happy on social media that this was able to be shown? Well, the Google Fusion Table wizard writes HTML code that won’t embed into WordPress, but will into Drupal (which I don’t use). However, I followed a quick guide on Paul Bradshaw’s Online Journalism Blog, and made the code a separate HTML file. A bit of fiddling about later, and this is now on Dropbox (it should be my server, but oh well), and it is embedded on here as an iframe. Now, anything I need to show, I can.

And now, for submission on Friday, I need to write the story. Suffice to say, it’s also pretty good data journalism in my book. All the data is open, but I did the research.

Food Standards Data: How We Did It.

Every food business must be inspected by the Environmental Health department of each council around the country. Until recently, that data was only released routinely in the notices that were put up when a premises was closed down. Good? Just deal with it. You’re not getting the data released.

Now, it’s available at the Food Standards Agency ratings website – a bit clunky, and hard to get through, but available alright.

So, the possibility to investigate how many under-performing places there are in your area is there. That’s what we did at the Daily Post.

Alex Hickey wanted to see who was good and who was bad. So, we needed the data. To get that, we had to try and scrape the Food Standards Agency Website. It wasn’t possible when we did this, but after a bit of working, it is possible now.

All you need to do is use scraperwiki. Code, easily edited, on that website will allow you to automatically scrape the food standards website. I’ll admit, i’m no coder. I can use basic HTML, and if I have a guide I could probably use a bit more advanced stuff, but Ruby? No.

However, I am logical, and scraping code is logical. This code, however, really doesn’t need that. You just need to replace the URL with the relevant one you wan’t.

The URL needed to be changed is on line 74

So, change it on line 74, then run. Then, you have the data.

Now, how do we visualise it? Well, you’ll need the Latitude and Longitude. So, the best way to that is to use the UK Postcodes API.

However, you need all the postcodes. The addresses in the table aren’t all completed, for a variety of reasons. One is just not available – however, it will be if you search. So, find the postcodes by Google searches for the empty ones which aren’t private. The private ones are just that – private.

Now, whilst still in Excel or Google Docs, you’ll need to sort, either into one or two tables, by rating. Do that, and then add a column at the end, and call it icon. Paste the fusion table icon relevant to the rating in that column. You can use any of these icons here for Fusion Tables.

With the postcodes you now have, you can import the table into Google Refine. Once you have it there, go to edit column on the postcodes column, and make a new column from this one.

In the expression box you get when you open it, you type:

value.parseJson()["administrative"]["geo"]["lat"]

and then do it for a second column:

value.parseJson()["administrative"]["geo"]["lng"]

Then, you can either set the Fusion Table location as two column location with the lat/long figures as they are. That’s what we’ll do for this.

So, import your table to fusion tables. Go to edit, modify columns, and set the locations. Then, go to visualise, map, and see the map. It’ll be red dots. Then go to configure styles, column, and select the column the icons name is in. They will then change, and you have a map that looks better.

Edit the info box code to however you want. Remember the name of the column is a variable, and will change – but won’t show up if the value of the cell is empty for that row. So, you can write the code in a way that can be clean and unique for each cell, if need be, by typing the joining words to into other cells.

The map I had in the end, for North Wales, looked like this:

And the way it looked in the newspaper, with the data:

Daily Post newspaper, Monday April 30, 2012.

Elections: So worth it!

Hear ye, hear ye! Let it be known from henceforth that I love elections.

I love all the voting, and the counting, and the nerves, and the politicians making fools of themselves vying for votes from hardcore supporters, the passive public, and apathetic majority. These people support and represent every single one of their constituents, and no matter what you think of them individually, or in general, elections are exciting.

And that doesn’t stop me from working either. In fact, I work more. This year, I inputted results for the Press Association, and produced 5 maps with the help from many other people who, like me, forfeited sleep and commandeered coffee.

They are now pride of place on the Daily Post website, for Conwy, Denbighshire, Gwynedd, Flintshire, and Wrexham. Basically, there is no basically about it. They were difficult, and took a lot of prep.

I said in my previous post about this that I was going to use Google Docs. I made the spreadsheets work with a combination of formulas, so that the people who were filling them in just needed to know:

You fill in the box with the votes. It will bring the winner down to the area below, and tell you who was voted in. The majority of times that was one winner, in some cases two, in a few it was three, and in one case four.

The table worked by taking the largest number (for the majority), and in some the second, third and fourth largest numbers, and putting them into a defined area. That was then brought over to a new page, and transposed (or flipped!) into columns, not rows. Easier for the inputters to input data into columns, especially with results, but map KML files need rows. So, we make both work.

The data needed sorting (copy, paste values only, sort by ward name, delete empty rows), and then copying to a prepared clean sheet to merge.

This process took all of five minutes. The result was a clickable map that meant constituents could see who was elected and where. All, if we’re being honest here, could be done for 0 pounds. Google Drive and Google Docs are free. Election results are open and public, rightly so. The way to make it better is to have the reporters filing into it, so it can be done faster. They can provide more detail, to make it better. And, with so many people using it, each with an eye for detail, you can be sure it’s checked!

And this is one of the councils which were elected. Conwy, my home county, so the one that affects me the most: