Major News Outlets Block OpenAI's Web Crawler, GPTBot, from Collecting Content

News outlets such as the New York Times, CNN, Reuters and the Australian Broadcasting Corporation (ABC) have blocked an OpenAI tool that collects content from their websites.

what is known

The Verge was the first to report the GPTBot ban. The Guardian later found that other major news sites, including CNN, Reuters, Chicago Tribune ABC, and others, also blocked the web crawler.

GPTBot’s ban is visible in publishers’ robots.txt files, which tell search engines and other organizations which pages they are allowed to visit.

All publishers listed added the ban in August. CNN confirmed the suspension of GPTBot. A spokesman for Reuters said the company regularly reviews robots.txt and the site’s terms of service.

The New York Times Terms of Service were also recently updated. The rules specifically prohibit the scraping of content for training and development of AI.

For those who don’t know

OpenAI is the creator of one of the most famous artificial intelligence chatbots, ChatGPT. Its web crawler, known as GPTBot, can crawl websites to improve AI.

Large language models like ChatGPT require huge amounts of information to train their systems. However, developers often hide the existence of copyrighted material in their datasets.

To counter possible violations, OpenAI has published information about GPTBot and explained how websites can prevent the crawler from collecting information from websites whose owners do not want their content to be used to train AI.

Those: The Guardian

2023-08-25 09:57:28
#York #Times #CNN #blocked #access #content #OpenAIs #web #crawler #GPTBot

Tourists massively refuse trips to Cyprus due to the threat of a powerful earthquake

Weather remains mixed: This is what awaits Zurich on the 1st August weekend

New tax refund of up to $1,400 will benefit thousands of families in the United States

LGV Bordeaux-Toulouse: north of Toulouse, opponents of the project are organizing after the felling ...

Major News Outlets Block OpenAI’s Web Crawler, GPTBot, from Collecting Content

what is known

For those who don’t know

Related posts:

Related

Leave a Comment Cancel reply

what is known

For those who don’t know

Related posts:

Share this:

Related

Leave a Comment Cancel reply