How Dow Jones is tackling AI copyright challenges

Traci Mabrey, general manager of risk, research and AI proposition at Dow Jones

During times of uncertainty, trusted news and journalism is at a premium as our readers rely on quality content to inform their daily decisions in business, finance and life. And generative AI might be the biggest uncertainty we have seen since the dawn of the internet 35+ years ago.

A new UK government needs to introduce new rules, as we lag behind other markets such as the EU, which has already passed its AI act. If the time taken to pass the Online Safety Act is anything to go by, this could be a multi-year process.

In the meantime, the onus is on all industry players to define an approach that puts journalistic integrity first and ensures that intellectual property rights are upheld.

That means all stakeholders - journalists, publishers, aggregators and governments - must work together today to build a healthy media ecosystem for tomorrow. Dow Jones believes in this both as a news publisher and an arbiter for publishers globally via the Factiva platform.

Start with transparency

Now into its second year, GenAI's potential as a force for good is clearer. But there are still understandable concerns, like opaque sources and a lack of traceability, giving rise to misinformation and hallucinations. When the stakes are as high as informing critical business decisions, we believe transparency is fundamental to integrating news data into AI models.

If we are to reap the rewards that GenAI technologies afford us, we need to be able to trace the origin of the information down to a sentence level.

This level of traceability is especially important for AI systems that are being used in highly regulated environments, such as financial services or risk management, where a decision may have a legal effect.

Dow Jones has established clear audit trails for content or summaries being generated. The first step in this process is ensuring that the foundational data is properly tagged. Our Factiva archive grows by 600,000 articles every day. Each of those articles is tagged against more than 3,000 different identifiers such as company names, individual names, content subject, and keywords. The resulting metadata allows us to pinpoint when, where and how often each piece of content appears in AI query results and summaries. This is really important because it means that we can inform publishers how their content is being used in searches and queries, and who is looking at it.

Transparent tagging and auditing is also critical when considering usage rights and restrictions publishers may place on their content. For example, an organisation may restrict access to content in certain jurisdictions, or only grant rights for internal use.

Enforcing these rights and limitations is not possible if you cannot apply them to the content as it flows from the data repository through to the final deliverable, for example as part of a summary to an end user. As such, it is critical that publishers tag and audit as part of any agreement with AI platforms, or any kind of content aggregation platform that may licence it on their behalf.

It also means that the users, some of whom work in highly-regulated sectors such as risk management and financial services, can trace back to exact, legitimate sources of an AI summary. For publishers, it makes it easier to understand who their key audiences are and how they can best serve them. That transparency also lays the foundation for fair and proper content use and compensation.

A sustainable compensation model

Producing high-quality journalistic content takes time, costs money and often, requires great risk. It is only fair they are compensated for these efforts in a sustainable way.

In the past, compensation would be based on search result rankings, or the number of clicks it received. But in the age of GenAI, news and information is often amalgamated alongside other content sources. This changes the rules of engagement between publisher and platform. We must now think about where the content shows up in the query result or how much of it is used - from a single sentence to a full article.

One of the key principles that we have implemented into our own models is an assurance that search results are not influenced by advertising. Traditional online search made it easy for end users to deduce which results are surfaced based on relevance, and which are sponsored, but with AI summaries this is much harder.

We developed a framework with legal and regulatory experts. It means that both our own intellectual property and that of the publishers across the Factiva ecosystem is properly protected. As this space continues to develop, and we explore new ways to engage with publishers and other AI platforms. We will also continue pushing for enhanced contract licensing documentation, transparency and compliance with changing regulations to guarantee fair usage and compensation.

As publishers know from past engagements with online search platforms and regulators regarding proper compensation for their content, this can be a lengthy process that misses the moment. Australia passed its legislation demanding that online search companies properly compensate publishers in 2021, with Canada and EU following suit and the UK finally passing similar legislation in May of this year. We cannot wait as long when it comes to fair, consistent and global rules around AI. And we cannot have a situation in which smaller publishers feel forced to enter into agreements that do not properly protect their copyright or guarantee proper use.

We are already engaging with many of the publishers within the Factiva ecosystem that wish to leverage our collective bargaining power, and we will continue to share our frameworks for content licensing and compensation as they evolve to provide everyone with access to the resources they need to defend creativity and the news.

Compliance and advocacy

As a publisher, we find ourselves with a dual responsibility. Beyond our primary role of reporting the news and providing informative content, Dow Jones must also serve as compliance officers. It is incumbent upon us to actively monitor and adapt to the ever-changing global regulatory landscape, ensuring that our practices align with copyright laws and regulations, particularly in light of the transformative impact of AI on content creation and distribution.

Alongside the wider NewsCorp family, we will continue to advocate for clearer guardrails that protect creative industries in this new era. And as we have seen with the UK’s recent u-turn on building transparency into AI regulation, a unified voice - peers, policymarkers and international bodies - will be essential to driving change and establishing consistent standards against AI-driven infringement.

There are undoubtedly significant upsides to the use of news data within AI systems around the world, but we must make sure that this new technology strengthens the industry instead of undermining it.

Traci Mabrey is the general manager of risk, research and AI proposition at Dow Jones

Tags: Click tag to find related articles; click icon for feed

Free daily newsletter

If you like our news and feature articles, you can sign up to receive our free daily (Mon-Fri) email newsletter (mobile friendly).

How Dow Jones is tackling AI copyright challenges

Start with transparency

A sustainable compensation model

Compliance and advocacy

Free daily newsletter

Related articles