Crucial elements of data journalism

Data tools and advice

Data journalism is a form of investigative journalism that tells a story through graphs, maps and other infographics. Peter Aldhous, a US-based journalist, says it is also a form of investigative journalism. It isn’t just about the figures but a good data story is a combination of various elements that are explored below.

Know where to find data 

Knowing where to find data is crucial

William Shubert, a senior project coordinator at the Earth Journalism Network (EJN), says knowing where to find is a useful skill for data journalists.

Adi Eyal, the director for Code for South Africa, an organisation pushing for open data, says the starting point in looking for data is online.

Finding data online is ideal for journalists working in Africa where some governments have put controls on the type of information that can be released.

Eyal’s organisation created a site that provides information about ward councillors in Western Cape and the projects that they are working on. Some of the data was scrapped from the website of the City of Cape Town and some came from government departments. Eyal says looking for data from various sources to use in a single data story is ideal for journalists.

“There is a lot of data available. Look for data from all sorts of places,” he says.

Countries like Kenya have made it easy by creating its own open data site. In South Africa, the Promotion of Acccess to Information Act enables data enthusiasts and journalists to access information from state departments.

Scrapping data

It doesn’t end with finding the right sources of data. Quite often the data comes in a format that is not easy to extract and analyse.

No need for a headache: tutorials will show you how to scrap the data

There are also various free tools that allow journalists and other users to extract data. These include outwit hub, google refine and Using them requires knowledge. Code for South Africa is part of a network of African open data organisations. Other networks are in Ghana, Nigeria and Kenya and they provide training to help journalists acquire such skills.

EJN also provides training and online resources that journalists can use. One such resource is the geojournalism handbook, which provides tutorials. Data journalism writer and trainer Paul Bradshaw also provides tutorials on his online blog.

Query the data 

Aldhous says querying the data is an important part of the journalistic process. Most journalists don’t have these kind of skills but will need to “befriend” a scientist who can help with the statistical analysis of the data, says Steve Connor, the science editor for The Independent.

Don’t take the data at face value

Querying would involve being aware of problems the data set has.

“What is missing from it? What errors does it have? Question everything. Check it out. If your mother says she loves you, you check it out,” Aldhous says.

Querying would also clear biases to ensure that that journalists “don’t debunk bad science by doing bad science,” says Deborah Cohen, the investigative editor at the BMJ.

Analysis and visualisation

Visualisation will help attract the reader to your story

Querying also involves analysis to see what trends are derived from it. Providing it in a tabular form or in an excel document can be quite daunting for the reader. There are data tools that are available that help journalists to visualise their data in a way that makes it palatable and easy to read. Such tools include Datawrapper, Geobatch, and Tableau.

Writing the story 

Be clear and concise

A data story isn’t just about the numbers. Brad Parks, the executive director of AidData, a good data story has to “break it down to something understandable.” It must be relevant and timely too, he says.

Aldhous says it must be accompanied by a compelling narrative that would be easy and enjoyable to ready.


Using outwit hub to scrap data

Data tools and advice

Last year I embarked on a project to map how much and where the US’s National Institutes of Health spends its funding for research in Africa. This was done with the intention of writing a special issue on the NIH, one of the major health research funders in Africa.Its clout means that a lot of researchers are interested in where and how it spends its money in Africa and relevant for Research Africa, a science policy publication that I write for. I spent months on the project but only managed to transfer a few entries from a website the organisation uses to store data about  recipients of their funding. The project turned out to be cumbersome and time consuming so I put it aside for a while (because I don’t believe in giving up).

That was until this year when I learnt how to use outwit hub. This tool allowed me to scrap the NIH website to extract the data I needed. It converted it to excel so that it would be easier to use.

Now I will show you how outwit hub helped me do  months’ work in 20 minutes.

The page

First you will need to download outwit hub to your computer.  It will bring you to this page.

Select extension mozilla for firefox and download.


An outwit hub icon will show at the corner of your page, which you need to click on to open outwit hub


When outwit hub has opened, copy the url of the web page where you want to collect data from. In this example, its the NIH url

Paste it onto outwit hub.

On the left there is a list of options. Select scrapers


It will open to this page. Click new and this will allow you to create a folder that you can use to scrap the data. Give the folder a name.

Open the folder by clicking the top panel on the scrapers page.

Once opened it will have blank spaces. Use those to list the categories you want to scrap. The list can be guided by the one used on the website.  In this case Acts, Project title, project leader, organisation, funder and costs.

Once you have listed the categories, return to the left side of outwit hub and click page sources. This is what will show


Page sources provide codes for the categories you want to list. We will use the example of Acts on the NIH webpage. I initially made the mistake of selecting codes from the category

And this was the result


Instead copy the codes that are written before one of the examples on the category, which is UO1 in this case.

Paste them on the category you have listed on outwit hub


Copy the code after. Paste it on the scraper as done earlier


Click save and execute. This will be the result

I want the project titles so go back to page source again. Copy the codes before one of the actual project, which is University of KwaZulu-Natal CAPRISA HIV Clinical Trials Unit, in this case. Paste it onto the project title category listed on your scrapper. Copy the codes after and paste

Repeat the process for all other categories. Save and executive.

This will be result



Click export at the bottom of the page

And voila! In 20 minutes you will have an excel spreadsheet of your that you can analyse.



Is age really just a number? A review of a Mail and Guardian story

Data tools and advice

I write a lot of science and science policy but I am a firm believer that politics can influence science. So I was excited to see the interactive map that the South African Mail and Guardian had done on the ages of leaders across the globe. They are talking of my president too so I am bound to be even more interested in the story, titled, When it comes to Mugabe, age is just a number.

Judging from the number is data stories in the data section the Mail and Guardian is relatively new to the field. Let’s examine how they have handled the data.

The data tool

They have used tableau, which allows users to visualise and share their data for free. Tableau has other options, which you have to pay for this article shows you can do a good data story with free resources.

 The length

It’s short and concise yet provides enough information to enable the reader to understand what the story is about. However, there is a downside to the length as it doesn’t allow the paper to provide context to the story.

The chart

They have used an interactive chart that allows you to see the age of each country leader when you click the dots. It’s clear for anyone to understand even those who are terrified by graphs like I used to be.

Each region has been assigned its colour, which is good because you will know which region the president of a country you have clicked is in. Let’s face it not all of us know how many countries are in the world, let alone which region they fall under. The story provides a good geography lesson and fulfils the one of the key roles of the media, which is to educate.

 The map

You have to click each country to see the age of the president for that specific country. I don’t think you need a map when you have a chart.


I like data stories that provide context. This is a fun data story but it doesn’t provide any context. I expected to be told why I, as a reader, I should care about the ages of the presidents.

A comment from a political analyst would have helped or comparing how countries who are led by older presidents perform against those with younger leaders.

The big lesson from this is that data stories should provide context.


How to use Json

Data tools and advice

Today in class we learnt how to extract data using JavaScript Object Notation (Json), a  format that is easier to read than xml. Data that comes in a Json format is also easy to collect, which is why it maybe worthwhile to learn for data enthusiasts.

There are simple steps to using Json and I outline them below.

Our class example was the UK postoffice API. A google search of UK postoffice API will lead to this url:

The url has an example API, which is listed on the section written as Return data for postcode. Copy the link

Paste the link onto a different webpage.

Then remove square brackets encircling postcode and (no space). Also remove the square brackets encircling xml, csv, Json and leave Json leaving.

Add your postcode. Some browsers will only allow you to open as a notepad.

If it opens in a browser copy and paste to a notepad.

When it downloads onto the notepad it will appear in linear format as in below

In the notepad a colon (:) denotes pairs so you have to separate the data into pairs

To do that press enter after the first comma

Press enter after geo and press the space tab to indent

Repeat process for the following pairs. New properties such as administrative don’t need to be indented.

Your notepad should end up something like this