The New York Times has taken preemptive measures to stop its content from being used to train artificial intelligence models. As reported by Adweek, the NYT updated its Terms of Service on August 3rd to prohibit its content — inclusive of text, photographs, images, audio/video clips, “look and feel,” metadata, or compilations — from being used in the development of “any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system.”
The updated terms now also specify that automated tools like website crawlers designed to use, access, or collect such content cannot be used without written permission from the publication. The NYT says that refusing to comply with these new restrictions could result in unspecified fines or penalties. Despite introducing the new rules to its policy, the publication doesn’t appear to have made any changes to its robots.txt — the file that informs search engine crawlers which URLs can be accessed.
That said, the NYT also signed a $100 million deal with Google back in February that allows the search giant to feature Times content across some of its platforms over the next three years. The publication said that both companies will work together on tools for content distribution, subscriptions, marketing, ads, and “experimentation,” so it’s possible that the changes to the NYT terms of service are directed at other companies like OpenAI or Microsoft.
OpenAI recently announced that website operators can now block its GPTBot web crawler from scraping their websites. Microsoft also added some new restrictions to its own T&Cs that ban people from using its AI products to “create, train, or improve (directly or indirectly) any other AI service,” alongside banning users from scraping or otherwise extracting data from its AI tools.
Earlier this month, several news organizations including The Associated Press and the European Publishers’ Council signed an open letter calling for global lawmakers to usher in rules that would require transparency into training datasets and consent of rights holders before using data for training.