2023 was very much the year of AI. After the birth of ChatGPT, Google Bard, and countless other large-language model (LLM) chatbots, artificial intelligence entered the public consciousness meaningfully for the first time. This year, though, things are set to turn up a notch. As The New York Times kicks off the year with a landmark copyright lawsuit, 2024 could very much be the year that the internet landscape and journalism change forever.
Firstly, it’s worth outlining just what aspects of artificial intelligence are having the most significant impact. For years, machine learning has been a huge part of how the internet works, with everything from advertising to concert tickets learning from user behaviours to improve and personalise online experiences. Last year, though, all of that changed. The release of ChatGPT to the public on the 30th November 2022 revolutionised how the public and businesses saw the technology. All of a sudden, LLMs went from being ‘helpful add-ons’ to potential job stealers and doomsday causers. Microsoft quickly acquired large stakes in OpenAI and Alphabet CEO Sundar Pichai vowed to “make the competition dance” with the release of its competitor, Google Bard.
These LLMs are a significant change from what went before. Their transformer models have the capability to learn from far larger data sets than was previously the case, ‘tokenise’ that data, and respond to queries in increasingly human-like ways. Without going into too much technical detail, these models rake huge amounts of data from millions of websites, train themselves on that data, and use it to respond to user queries and questions.
Inevitably, after an initial surge in popularity and excitement, publishers, content creators, and basically anyone producing content on the internet became quickly concerned with how copyright might be jeopardised in this scenario. The billions of dollars so far invested in the industry have been almost entirely predicated on the argument of OpenAI’s CEO Sam Altman and others that the use of this data falls under ‘fair use’ exemptions from copyright law. In the United States, though, outcomes in cases like these are famously hard to predict, and almost identical cases often have inconsistent results depending on judges and states.
The ‘fair use’ argument that AI bosses use is based on a series of factors. Primarily, the idea is that although these models are trained off millions of websites and articles written by others and covered by copyright law, they are not directly reproducing it. OpenAI and others argue that their use is ‘transformative’, much like a parody of a song or a book review.
A common case referenced here is when Google Books was sued by The Authors Guild in September 2005. In that case, the judges sided with Google. The decision was based on the idea that the company was not building a ‘book substitute’ but instead a search engine and database for different publications.
As you might have guessed by now, many publishers wholly disagree with OpenAI’s reading of the law and on the 27th of December, The New York Times made the first legal move after months of attempted negotiations. The NYT said that those negotiations “had not produced a resolution”, whilst OpenAI said that it was “surprised and disappointed”.
It’s almost impossible to predict an outcome in this specific case. OpenAI and Microsoft are, by all accounts, extremely reluctant to settle with The New York Times for fear of thousands of different publishers following suit and queuing up for pay-outs. More likely, it seems, is the eventual establishment of some long-term model to repay writers and publishers.
Already, times are tougher than they have ever been for news sites and journalists. The New York Times is one of the few organisations that has managed to establish a sustainable subscription model in the industry, and other newspapers from around the world have experimented with different models to try to survive. Some, such as The Independent, went online only as early as 2016, but most established similar online subscription models.
Clearly, then, a world in which users can ask chatbots for a summary of the news or even to reproduce entire articles that would otherwise be behind a paywall is extremely problematic for the industry. In this sense, ChatGPT and others are producing clear alternatives to news products and not falling under ‘fair use’ exemptions.
This is just one way in which natural language models are set to transform the internet as we know it in the coming months and years. An entire industry that has been built on search engine optimisation and referral links is about to be shaken up more than could have been imagined just 18 months ago. If users are simply interacting with chatbots, they will no longer have to use search engines such as Google to find information.
It is also true that there are still more questions than answers. If those language models continue to produce the same amount of content for websites as they are at the moment, will their training data be compromised? How will advertising adapt? How are these chatbots even monetisable?
So, the outcome of this particular lawsuit is up in the air and is likely to remain so for months. What is sure, though, is that 2024 will see the internet change in the most significant way since the advent of social media.