Crucial elements of data journalism

Data tools and advice

Data journalism is a form of investigative journalism that tells a story through graphs, maps and other infographics. Peter Aldhous, a US-based journalist, says it is also a form of investigative journalism. It isn’t just about the figures but a good data story is a combination of various elements that are explored below.

Know where to find data 

Knowing where to find data is crucial

William Shubert, a senior project coordinator at the Earth Journalism Network (EJN), says knowing where to find is a useful skill for data journalists.

Adi Eyal, the director for Code for South Africa, an organisation pushing for open data, says the starting point in looking for data is online.

Finding data online is ideal for journalists working in Africa where some governments have put controls on the type of information that can be released.

Eyal’s organisation created a site that provides information about ward councillors in Western Cape and the projects that they are working on. Some of the data was scrapped from the website of the City of Cape Town and some came from government departments. Eyal says looking for data from various sources to use in a single data story is ideal for journalists.

“There is a lot of data available. Look for data from all sorts of places,” he says.

Countries like Kenya have made it easy by creating its own open data site. In South Africa, the Promotion of Acccess to Information Act enables data enthusiasts and journalists to access information from state departments.

Scrapping data

It doesn’t end with finding the right sources of data. Quite often the data comes in a format that is not easy to extract and analyse.

No need for a headache: tutorials will show you how to scrap the data

There are also various free tools that allow journalists and other users to extract data. These include outwit hub, google refine and import.io. Using them requires knowledge. Code for South Africa is part of a network of African open data organisations. Other networks are in Ghana, Nigeria and Kenya and they provide training to help journalists acquire such skills.

EJN also provides training and online resources that journalists can use. One such resource is the geojournalism handbook, which provides tutorials. Data journalism writer and trainer Paul Bradshaw also provides tutorials on his online blog.

Query the data 

Aldhous says querying the data is an important part of the journalistic process. Most journalists don’t have these kind of skills but will need to “befriend” a scientist who can help with the statistical analysis of the data, says Steve Connor, the science editor for The Independent.

Don’t take the data at face value

Querying would involve being aware of problems the data set has.

“What is missing from it? What errors does it have? Question everything. Check it out. If your mother says she loves you, you check it out,” Aldhous says.

Querying would also clear biases to ensure that that journalists “don’t debunk bad science by doing bad science,” says Deborah Cohen, the investigative editor at the BMJ.

Analysis and visualisation

Visualisation will help attract the reader to your story

Querying also involves analysis to see what trends are derived from it. Providing it in a tabular form or in an excel document can be quite daunting for the reader. There are data tools that are available that help journalists to visualise their data in a way that makes it palatable and easy to read. Such tools include Datawrapper, Geobatch, and Tableau.

Writing the story 

Be clear and concise

A data story isn’t just about the numbers. Brad Parks, the executive director of AidData, a good data story has to “break it down to something understandable.” It must be relevant and timely too, he says.

Aldhous says it must be accompanied by a compelling narrative that would be easy and enjoyable to ready.

Advertisements

Using outwit hub to scrap data

Data tools and advice

Last year I embarked on a project to map how much and where the US’s National Institutes of Health spends its funding for research in Africa. This was done with the intention of writing a special issue on the NIH, one of the major health research funders in Africa.Its clout means that a lot of researchers are interested in where and how it spends its money in Africa and relevant for Research Africa, a science policy publication that I write for. I spent months on the project but only managed to transfer a few entries from a website the organisation uses to store data about  recipients of their funding. The project turned out to be cumbersome and time consuming so I put it aside for a while (because I don’t believe in giving up).

That was until this year when I learnt how to use outwit hub. This tool allowed me to scrap the NIH website to extract the data I needed. It converted it to excel so that it would be easier to use.

Now I will show you how outwit hub helped me do  months’ work in 20 minutes.

The page

First you will need to download outwit hub to your computer.  It will bring you to this page.

Select extension mozilla for firefox and download.

 

An outwit hub icon will show at the corner of your page, which you need to click on to open outwit hub

 

When outwit hub has opened, copy the url of the web page where you want to collect data from. In this example, its the NIH url

Paste it onto outwit hub.

On the left there is a list of options. Select scrapers

 

It will open to this page. Click new and this will allow you to create a folder that you can use to scrap the data. Give the folder a name.

Open the folder by clicking the top panel on the scrapers page.

Once opened it will have blank spaces. Use those to list the categories you want to scrap. The list can be guided by the one used on the website.  In this case Acts, Project title, project leader, organisation, funder and costs.

Once you have listed the categories, return to the left side of outwit hub and click page sources. This is what will show

 

Page sources provide codes for the categories you want to list. We will use the example of Acts on the NIH webpage. I initially made the mistake of selecting codes from the category

And this was the result

 

Instead copy the codes that are written before one of the examples on the category, which is UO1 in this case.

Paste them on the category you have listed on outwit hub

 

Copy the code after. Paste it on the scraper as done earlier

 

Click save and execute. This will be the result


I want the project titles so go back to page source again. Copy the codes before one of the actual project, which is University of KwaZulu-Natal CAPRISA HIV Clinical Trials Unit, in this case. Paste it onto the project title category listed on your scrapper. Copy the codes after and paste

Repeat the process for all other categories. Save and executive.

This will be result

 

Ci

Click export at the bottom of the page

And voila! In 20 minutes you will have an excel spreadsheet of your that you can analyse.