Lending a helping hand to data journalists

Community, The stories

Organisation spearheads collaboration between programmers and journalists to help the latter acquire data journalism skills.

Data journalism is increasingly becoming a popular form of journalism. Its potential in helping scribes tell a compelling story has seen an exponential growth in projects that involve data journalism across the world. But in South Africa, this form of journalism is lagging behind due to a lack of skills. A group of geeks, has however, developed a project that seeks to train journalists to work with data.

Photo credit: Rob Enslin. Project to build data journalism teams in South African newsrooms

Adi Eyal who leads Code for South Africa, the organisation that is leading the project, speaks about this new venture they will embark on this month.

Tell us about Code for South Africa

We are an organisation that is pushing for open data in South Africa. We don’t have a culture of questioning, engaging and using information and we want to change that. Our role is to promote the use of data.

We are focused on finding answers to questions like how do we start to get people using the data that already exists? How can people use available information to make decisions about where they live or where they should send their child to school? We are focused on making people to use the information available to make informed decisions.

Where does the journalism project fit in and why the media initiative?

Journalists come to us and say they need skills. This is in response to that.

What will the media initiative involve?

This is a project that will run for six months. We will work to build data journalism teams in selected newsrooms. There are people already working as designers, software developers and journalists within newsrooms. We will create teams out of these and teach them how to work with data, where they can get it, how to clean it and what to use it for. The teams members’ different skill set should complement each other and help their publications use data to tell compelling stories. We are trying to create rock star teams out of the people that newsrooms have.

Photo credit: Sean MacEntee. Data is a tool for telling a story

What data skills do newsrooms need?

Being able to access data and visualise data is important. Journalists must also have a maturity about data. This sounds touchy feely but data journalism is about understanding how to take a project from concept and what you require to turn it to a final product, which can be  cumbersome.

Data journalism has a project managerial component. One needs to see the process from start to finish.

There is need to understand and interpret it. Journalists must also verify the data because it can’t be trusted by itself.

All those skills are important.

Where can journalists find data? Where do you find yours?

There is a lot of information that is already available from various websites. It comes from the different places, such as municipality and government websites. We put it together in one place and turn it to a product that is easy to use.

Can you give us an example of one such product that you have developed?

We have developed a medical price database. Medicine prices are regulated in South Africa meaning there is a maximum amount customers should pay but people don’t know that.

We have built a mobile app that allows people to punch in the name of a medicine and it will tell them what the regulated price is and see if the pharmacy is charging that.

How easy is it to find data?

It’s not easy. There is no official open data policy. There is data that is available but no process through which data is made accessible. The Promotion of Access to Information Act insists that data must be made available rather than government proactively release it.

This is not an effective way of getting information. The process requires you to contact an information officer of the respective department from which you are seeking information. Sometimes their email bounces or they do not respond to emails. Requests can also be ignored or rejected on baseless grounds. You can appeal but that is a time consuming and expensive process. It doesn’t make it the best way of extracting data.

Any advice for data journalists?

Photo: Esther Vargas. Journalists need to be proactive in learning data skills

Journalists in South Africa can join their local hackshackers, which provides a platform for journalists and data programmers to get together to talk about data journalism projects.

Meet up with other people from your profession and from a completely different world.

It is wrong to think that just because you are not a software developer you can’t be involved.

There also needs to be more data stories. Build skills to help you dig below surface and use tools that can help ordinary readers understand the information you are relating.

Using outwit hub to scrap data

Data tools and advice

Last year I embarked on a project to map how much and where the US’s National Institutes of Health spends its funding for research in Africa. This was done with the intention of writing a special issue on the NIH, one of the major health research funders in Africa.Its clout means that a lot of researchers are interested in where and how it spends its money in Africa and relevant for Research Africa, a science policy publication that I write for. I spent months on the project but only managed to transfer a few entries from a website the organisation uses to store data about  recipients of their funding. The project turned out to be cumbersome and time consuming so I put it aside for a while (because I don’t believe in giving up).

That was until this year when I learnt how to use outwit hub. This tool allowed me to scrap the NIH website to extract the data I needed. It converted it to excel so that it would be easier to use.

Now I will show you how outwit hub helped me do  months’ work in 20 minutes.

The page

First you will need to download outwit hub to your computer.  It will bring you to this page.

Select extension mozilla for firefox and download.

 

An outwit hub icon will show at the corner of your page, which you need to click on to open outwit hub

 

When outwit hub has opened, copy the url of the web page where you want to collect data from. In this example, its the NIH url

Paste it onto outwit hub.

On the left there is a list of options. Select scrapers

 

It will open to this page. Click new and this will allow you to create a folder that you can use to scrap the data. Give the folder a name.

Open the folder by clicking the top panel on the scrapers page.

Once opened it will have blank spaces. Use those to list the categories you want to scrap. The list can be guided by the one used on the website.  In this case Acts, Project title, project leader, organisation, funder and costs.

Once you have listed the categories, return to the left side of outwit hub and click page sources. This is what will show

 

Page sources provide codes for the categories you want to list. We will use the example of Acts on the NIH webpage. I initially made the mistake of selecting codes from the category

And this was the result

 

Instead copy the codes that are written before one of the examples on the category, which is UO1 in this case.

Paste them on the category you have listed on outwit hub

 

Copy the code after. Paste it on the scraper as done earlier

 

Click save and execute. This will be the result


I want the project titles so go back to page source again. Copy the codes before one of the actual project, which is University of KwaZulu-Natal CAPRISA HIV Clinical Trials Unit, in this case. Paste it onto the project title category listed on your scrapper. Copy the codes after and paste

Repeat the process for all other categories. Save and executive.

This will be result

 

Ci

Click export at the bottom of the page

And voila! In 20 minutes you will have an excel spreadsheet of your that you can analyse.