Finding one name in a sea of scanned documents or a video footage takes a lot of legwork. This new tool can speed up the process and supercharge collaborative data investigations
The Panama Papers, the biggest leaks in journalism’s history, saw hundreds of investigative journalists analyse 11.5 million documents over several years to produce data-led stories.
Perhaps it would have been different today as Microsoft is trying to solve this pain point of data journalism. Its new tool - Content Insights and Discovery Accelerator, or IDA - can analyse hundreds of thousands documents or long video footage within seconds.
Combining artificial intelligence, object vision and optical character recognition (OCR), IDA can analyse pages and extract text, images and other key data. It also helps journalist search long videos, identifying faces or keywords, and provide searchable footage transcript.
You can use IDA to analyse documents on your own or create a collaborative team (using the 'Portfolio' function) that can work on the same investigation and leave comments on your shared project. If you choose the 'Private' setting, only you can see your database, while 'Public' setting will allow anyone within your organisation to take a look at your work.
Once you have uploaded your data, such as scanned pages of a document or a large number of emails, you can start searching for a specific term. We used the Mueller Report to try out IDA ourselves, and searched for Putin.
IDA helps you not only find a name on every page of your dataset (highlighted in yellow), it also provides you with insights about how often it features, and other names connected to it.
You can also click on the keywords in grey to search additional context or definition from Bing. Although this has its limitations, it can be a good place to start exploring an unfamiliar topic.
Once you have your analytics displayed, you can start exploring 'Insights'. Colour-coded graphics will show data and people contained in your dataset.
'Relationships' tab helps you explore how these names are connected and 'Stacked Bar' allows you to compare variables, for example, names and locations.
This can be of great help when you start analysing a dataset as you can see what people, locations or other data feature most prominently.
This feature helps you analyse speech and faces in any footage, no matter the length. Facial detection function currently recognises more than 2 million public figures and also gives you the percentage of how how long they appear in the video, along with information of who they are and the probability of it being that person.
The facial detection database contains most of the mainstream celebrities, politicians, or sportspeople. It can, however, be personalised and you can add your local politicians or any people of interest your publication routinely reports on.
Video indexer also gives you the main topics and ‘named entities’ which it picks up from the speech. If you want to see these in their context, you can search the exact wording in the transcription. This function also shows you where in the video your term appears and how many times.
Video indexer also has a subtitles function that allows you to follow the speech with real-time transcription. This can be translated to more than 30 languages and you can even share the translated video with your audience.
Like with the facial database, you can add specific terms or jargon to the language database. If, for example, you report on video games and you have a footage from a gaming show, you are able to add all the names of video game characters for an accurate transcription and analysis. Same goes for health or sports reporting.
IDA is 80 per cent developed and the other 20 per cent customisable with your own developer so you can add features or data for analytics that matter the most to your reporting.
If you like our news and feature articles, you can sign up to receive our free daily (Mon-Fri) email newsletter (mobile friendly).
Sign up to receive job alerts of your choice by email, or manage your subscription
Featured recruiter: click to view its vacancies
New digital journal covering inflation and rate setting seeks a talented writer with experience as a financial journalist to use data to produce specialist content
Subscribe to our newsletter for latest news, tips, jobs and more
End that deadline stress today and find help in our freelance directory
Personal trainer James Hilton has launched a podcast 'Jim's Gym - Inspiring Movement'. James, a specialist in biomechanics and injury recovery from the Cotswolds, runs Jim's Gym, a virtual online space supporting people over 55 to be more active
Our next Newsrewired conference will be in May 2025, London.
Conferences and study weeks are fantastic opportunities to get the latest updates on the industry and network with your peers
Awards are a great way to have your hard work recognised by industry experts and celebrate your teams. Here is where you can apply
If you find your social feeds a tad too heavy on men's voices, follow and connect with these fantastic women experts on indie media
How do you move print readers to digital? Are there other ways to hold on to subscribers besides a last-ditch deal?