Left to right: Ramaa Sharma, Xiaoxuan Liu, Denis Newman-Griffis, Leoni Robertson, Paul Bradshaw
In a nutshell:
- Data journalists face three main challenges: non-existent data, biased data, and data that fails to represent human experiences adequately, particularly affecting coverage of topics like disability and healthcare where information is often incomplete or lacking nuance.
- While AI could potentially help address these data gaps, experts warn it should be used cautiously as AI models reflect existing societal biases and are inherently probabilistic, so data must be verified.
- Journalists can work around data limitations by using AI to challenge assumptions and identify missing perspectives, combining multiple datasets to fill gaps, and treating data biases themselves as newsworthy stories - though this requires diverse newsrooms with journalists who can recognise these issues through their lived experiences.
The full story:
Data journalists tend to run into one of three issues when trying to find good data for a story.
One, the data does not exist. Two, the data is biased. Three, what data exists fails to capture human experiences.
But the bigger question, as explored at a BBC Fusion online stream today (19 February) is whether AI - with its questionable reliability - could actually help journalists overcome these challenges.
Public datasets can be patchy for all sorts of reasons
In the case of disability data, Denis Newman-Griffis, data scientist and senior lecturer at the University of Sheffield, says there are too many nuances around how people identify with their experiences to get the whole picture.
In the case of health data, Xiaoxuan Liu, associate professor in AI and digital health at the University of Birmingham, says that there are many cases where patients have not consented to give their data or had the opportunity to provide or correct data. It is not uncommon to see ethnicities, age groups and gender percentages missing.
The result is that public data capture is messy. And as the future becomes more data-driven, technology based on this data will work better for the over-represented and worse for the under-represented. In other words, AI practices will become as influential as our own human biases.
The absence of data makes these stories harder to validate, said moderator Ramaa Sharma, a digital inclusive AI consultant, and former BBC head of digital.
This in turn makes it difficult to persuade commissioners to run these stories. And should newsrooms want to commission data, this can be expensive.
So, what can we do about it as journalists?
First, take AI models with a pinch of salt. They are also biased, as they are based on a biased society, explains Paul Bradshaw, a data journalist for the BBC Shared Data Unit, and course leader for data journalism at Birmingham City University. "AI is a probabilistic technology, it is always uncertain," he says.
Biases in data, if spotted, can be stories in their own right, explains Leoni Robertson, a data journalist for BBC World Service. Her colleague Maryam Ahmed wrote a piece about bias against women with darker skin applying for UK passports.
Read more: How BBC is using artificial intelligence
The rub, of course, is that it takes journalists with the lived experience, and therefore inclusive newsrooms, for those stories to materialise. Many journalists would not have spotted anomalies within the heaps of data.
Then again, it might be enough to create an AI model that challenges your worldview, Bradshaw continues. He tasked his students to just ask AI models for suggestions of other diverse sources to speak to. These are not earth-shattering suggestions, as stories on football clubs tend to yield LGBTQ+ fan groups or people from less affluent parts of town. But having a simple checklist is enough to mitigate human bias.
The same can be true of glaringly missing data not collected by governments. A simple prompt to an AI like "what is missing from the dataset?" could be enough inspiration, adds Sharma.
"Statistics only take you so far," adds Robertson, explaining that she has gone back to academics and her own sources to back up data stories with blind spots.
She also explained that where data is insufficient, it is often the case that multiple datasets can be pieced together to mitigate gaps. BBC World Service, in collaboration with BBC Pashto, has to do this when reporting on public punishments in Afghanistan between when it was reintroduced in November 2022 to July 2023, given that public data on this was not readily available.
Journalists had to combine various sources, such as statements from the X account of the Afghan Supreme Court, UNAMA (a UN special task force), and wider news reporting.
Robertson confirmed to Journalism.co.uk: "Although the Afghan Supreme Court data was the primary source this was used alongside the UNAMA report, Corporal punishment and the death penalty in Afghanistan UNAMA Human Rights May 2023 and recognised news reports. Not all data was available on location, number of people, gender, and type of punishment. Therefore we have published the data based on availability."
This is, again, an opportunity for AI to do a lot of the heavy lifting or to locate sources of data that journalists did not know existed. Everything must be verified thoroughly, though, as AI is prone to hallucinations.
We used Claude AI to provide a summary of the news article. This article was also updated on 20 February 2025 with more contextual information about the BBC's reporting on public punishments in Afghanistan.
Free daily newsletter
If you like our news and feature articles, you can sign up to receive our free daily (Mon-Fri) email newsletter (mobile friendly).