“The term 'data journalist' is a bit of a jack-of-all-trade term,” Marianne Bouchart told delegates at the news:rewired conference in London on Tuesday.
“Some call us computer-assisted reporters, journalist programmers, journo geeks… unicorns. It varies.”
Bouchart is communications director and Data Journalism Award Manager at the Global Editors Network – and the founder of Hei-Da.org, a not-for-profit organisation set up earlier this year which specialises in open data-driven projects.
She is also the founder and editor of the Data Journalism Blog, and gave workshop attendees some expert advice on how to source and use data sets for storytelling.
Here is a list of sources recommended by her:
-
Dataportals.org – a comprehensive list of open data portals from around the world, a good starting point for locating a diverse range of data
-
FindTheData.com – similar to Dataportals.org, it contains a variety of data sets on various topics and industries
-
EU Data Portal – the EU Data Portal launched last week, and is still in beta. Sponsored by the European Commission, it can be used to browse official data sets
-
European Union Open Data Portal – much like the EU Data Portal, it offers comprehensive data sets on various subjects across Europe
-
Data.gov.uk – the UK government's data website, containing public data to help people understand how data works and how policy is made
-
Data.gov – the US counterpart to Data.gov.uk. There are many similar websites available to source data from other countries
-
Open Corporates – the largest open database of companies in the world. Its main goal is to have a URL for each established company and contains lots of specific business data
-
WikiLeaks – people presume that WikiLeaks is outdated, but Bouchart stressed that it is still an exceptional resource, with a regularly updated website
-
The World Bank – it has a data portal that offers free and open data about development issues around the world
-
The UN Data Portal – grants access to a comprehensive list of data sets, broken down by countries and themes
-
The UNHCR Data Portal – dedicated to data about the refugee crisis, it is a very visual resource that often provides raw data sets
-
The World Health Organisation Data – this resource offers a large data library with maps and reports, as well as country-specific statistics
-
Google Public Data Explorer – enter keywords and it will bring you results of data sets according to what you’re looking for, broken down by the data sources
-
GetTheData.org – a forum where users can ask others where they can find specific data
-
Crowdsourcing using Google Forms – previously used successfully by organisations like the Guardian, when compiling data from their readers regarding how many Olympic tickets they had purchased
-
WhatDoTheyKnow.com – this resource is a good tool to use when you can’t find the data you need. The website gathers all the Freedom of Information requests that have ever been submitted, and tells you whether they were successful or not
-
Quora – can be used to browse information, and much like GetTheData.org, ask others where to source specific data
You can also find datasets directly on Google by using the following search operators:
-
Filetype:CSV and filetype:XLS for Excel spreadsheets
-
Filetype:shp for geodata
-
Filetype: MDB, filetype: SQL, filetype:DB for database extracts
-
You can even look for filetype:pdf – for example, site:Adidas-group.com filetype:pdf
-
inurl:downloads filetype:xls, which allows you to find not only documents made public by companies or organisations, but also information they have shared internally
For more advanced data journalism, try data scraping with Google. Bouchart’s one line magic formula to use in Google Spreadsheets for scraping data from HTML tables is =importHTML(“”,”table”,N).
She also recommended Berkeley’s tutorial on spreadsheets, as well as the Centre for Investigative Journalism’s Data Journalism Handbook for further information on interrogating data using spreadsheets.
Finally, don’t forgot to clean your data! Bouchart said that holes in data sets mean the information could be wrong and unreliable.
She advised using Open Refine, a free and open source tool that doesn’t necessarily require an internet connection once the software has been downloaded on your computer.
Free daily newsletter
If you like our news and feature articles, you can sign up to receive our free daily (Mon-Fri) email newsletter (mobile friendly).
Related articles
- Why DC Thomson's data journalists are keeping tabs on high street businesses
- Tackling new challenges for data journalism, with DC Thomson's Lesley-Anne Kelly and Ema Sabljak
- 28 English-language news outlets in Europe to follow
- 40 essential newsletters every journalist should read
- 15 free sources of data on the media industry