Marianne Bouchart, communications director and Data Journalism Award Manager at the Global Editors Network, shares her list of tools and tips for sourcing data online
“The term 'data journalist' is a bit of a jack-of-all-trade term,” Marianne Bouchart told delegates at the news:rewired conference in London on Tuesday.
“Some call us computer-assisted reporters, journalist programmers, journo geeks… unicorns. It varies.”
Bouchart is communications director and Data Journalism Award Manager at the Global Editors Network – and the founder of Hei-Da.org, a not-for-profit organisation set up earlier this year which specialises in open data-driven projects.
She is also the founder and editor of the Data Journalism Blog, and gave workshop attendees some expert advice on how to source and use data sets for storytelling.
Here is a list of sources recommended by her:
Dataportals.org – a comprehensive list of open data portals from around the world, a good starting point for locating a diverse range of data
FindTheData.com – similar to Dataportals.org, it contains a variety of data sets on various topics and industries
EU Data Portal – the EU Data Portal launched last week, and is still in beta. Sponsored by the European Commission, it can be used to browse official data sets
European Union Open Data Portal – much like the EU Data Portal, it offers comprehensive data sets on various subjects across Europe
Data.gov.uk – the UK government's data website, containing public data to help people understand how data works and how policy is made
Data.gov – the US counterpart to Data.gov.uk. There are many similar websites available to source data from other countries
Open Corporates – the largest open database of companies in the world. Its main goal is to have a URL for each established company and contains lots of specific business data
WikiLeaks – people presume that WikiLeaks is outdated, but Bouchart stressed that it is still an exceptional resource, with a regularly updated website
The World Bank – it has a data portal that offers free and open data about development issues around the world
The UN Data Portal – grants access to a comprehensive list of data sets, broken down by countries and themes
The UNHCR Data Portal – dedicated to data about the refugee crisis, it is a very visual resource that often provides raw data sets
The World Health Organisation Data – this resource offers a large data library with maps and reports, as well as country-specific statistics
Google Public Data Explorer – enter keywords and it will bring you results of data sets according to what you’re looking for, broken down by the data sources
GetTheData.org – a forum where users can ask others where they can find specific data
Crowdsourcing using Google Forms – previously used successfully by organisations like the Guardian, when compiling data from their readers regarding how many Olympic tickets they had purchased
WhatDoTheyKnow.com – this resource is a good tool to use when you can’t find the data you need. The website gathers all the Freedom of Information requests that have ever been submitted, and tells you whether they were successful or not
Quora – can be used to browse information, and much like GetTheData.org, ask others where to source specific data
You can also find datasets directly on Google by using the following search operators:
Filetype:CSV and filetype:XLS for Excel spreadsheets
Filetype:shp for geodata
Filetype: MDB, filetype: SQL, filetype:DB for database extracts
You can even look for filetype:pdf – for example, site:Adidas-group.com filetype:pdf
inurl:downloads filetype:xls, which allows you to find not only documents made public by companies or organisations, but also information they have shared internally
For more advanced data journalism, try data scraping with Google. Bouchart’s one line magic formula to use in Google Spreadsheets for scraping data from HTML tables is =importHTML(“”,”table”,N).
She also recommended Berkeley’s tutorial on spreadsheets, as well as the Centre for Investigative Journalism’s Data Journalism Handbook for further information on interrogating data using spreadsheets.
Finally, don’t forgot to clean your data! Bouchart said that holes in data sets mean the information could be wrong and unreliable.
She advised using Open Refine, a free and open source tool that doesn’t necessarily require an internet connection once the software has been downloaded on your computer.
If you like our news and feature articles, you can sign up to receive our free daily (Mon-Fri) email newsletter (mobile friendly).
Sign up to receive job alerts of your choice by email, or manage your subscription
Featured recruiter: click to view its vacancies
New digital journal covering inflation and rate setting seeks a talented writer with experience as a financial journalist to use data to produce specialist content
Subscribe to our newsletter for latest news, tips, jobs and more
End that deadline stress today and find help in our freelance directory
Personal trainer James Hilton has launched a podcast 'Jim's Gym - Inspiring Movement'. James, a specialist in biomechanics and injury recovery from the Cotswolds, runs Jim's Gym, a virtual online space supporting people over 55 to be more active
Our next Newsrewired conference will be in May 2025, London.
Conferences and study weeks are fantastic opportunities to get the latest updates on the industry and network with your peers
Awards are a great way to have your hard work recognised by industry experts and celebrate your teams. Here is where you can apply
If you find your social feeds a tad too heavy on men's voices, follow and connect with these fantastic women experts on indie media
How do you move print readers to digital? Are there other ways to hold on to subscribers besides a last-ditch deal?