News outlets such as the New York Times, CNN, Reuters and the Australian Broadcasting Corporation (ABC) have blocked an OpenAI tool that collects content from their websites.
what is known
The Verge was the first to report the GPTBot ban. The Guardian later found that other major news sites, including CNN, Reuters, Chicago Tribune ABC, and others, also blocked the web crawler.
GPTBot’s ban is visible in publishers’ robots.txt files, which tell search engines and other organizations which pages they are allowed to visit.
All publishers listed added the ban in August. CNN confirmed the suspension of GPTBot. A spokesman for Reuters said the company regularly reviews robots.txt and the site’s terms of service.
The New York Times Terms of Service were also recently updated. The rules specifically prohibit the scraping of content for training and development of AI.
For those who don’t know
OpenAI is the creator of one of the most famous artificial intelligence chatbots, ChatGPT. Its web crawler, known as GPTBot, can crawl websites to improve AI.
Large language models like ChatGPT require huge amounts of information to train their systems. However, developers often hide the existence of copyrighted material in their datasets.
To counter possible violations, OpenAI has published information about GPTBot and explained how websites can prevent the crawler from collecting information from websites whose owners do not want their content to be used to train AI.
Those: The Guardian
2023-08-25 09:57:28
#York #Times #CNN #blocked #access #content #OpenAIs #web #crawler #GPTBot