OpenAI has transcribed over a million hours of YouTube videos to train ChatGpt

OpenAI has transcribed more than a million hours of video YouTube to train his artificial intelligence.

He writes it New York Timesrevealing how the company he created ChatGptincreasingly hungry for data with which to train AI, has drawn on the popular video video platform without having informed Alphabetthe company that controls both YouTube and Google.

OpenAI, and especially the former president Greg Brockman who personally selected some of the videos, knew there was a possibility of breaking the rules. But she would complete the data collection based on the fair usea US legal rule that allows limited use of copyrighted material without having to ask permission. But only when the use is for educational, critical or journalistic purposes and does not damage the market value of the original work.

In the specific case of the data collected by OpenAI there is doubt that fair use does not hold. The company led by Sam Altman, in fact, he used copyrighted content to create a product that not only chases a profit, but actually represents a threat to companies whose data has been stolen.

All this – writes the Now – was clear to OpenAI, but it did not deter it from the intent to get its hands on more valuable content: those produced by real people.

In fact, “human” data is the most valuable for companies developing artificial intelligence.

Artificial intelligence

The contents of the social network Reddit will be used to train AI: 60 million dollar agreement

by Pier Luigi Pisa

February 19, 2024

We knew that OpenAI, in recent years, has plundered public and private platforms without asking for permission, in most cases.

OpenAI also collected data from the New York Times, which recently responded to this practice – deemed unfair – with a lawsuit. What we didn’t know, however, is that the San Francisco company has exhausted the databases available to feed ChatGpt even in 2021. A year before its popular chatbot was opened to the public.

At that point – according to what the sources of the Now – OpenAI thought of transcribing everything: you give video, you have podcast.

And for this he created an AI model, Whisperwhich is one of the most powerful and accurate around when it comes to transcription.

In recent days the CEO of YouTube, Neal Mohan, confirmed to Bloomberg the possibility that OpenAI used the platform’s videos for training Sorahis model of AI that generates videos.

Artificial intelligence

Seven filmmakers tried OpenAI’s Sora with impressive results

by Pier Luigi Pisa

26 March 2024

Google spokesperson, Matt Bryantsaid the Mountain View company takes “technical and legal measures” to prevent such unauthorized use “when there is a clear legal basis for doing so.”

According to NYT sources, Google also transcribed YouTube videos to train Gemini, its artificial intelligence. Bryant said the company has trained its AI models “on some YouTube content, in accordance with creator agreements.”

In short, it appears an epic clash between two AI giants.

But it is also interesting to note that both Google and OpenAI are in the same boat when it comes to the data needed by their AI, which is being consumed more and more rapidly.

And that’s a problem as such artificial intelligences progress only if I can absorb ever-increasing amounts of content produced by humans and not, for example, by other AI.

Il Wall Street Journal he wrote this week that quality content on which to train AI they could run out by 2028.

It is worth remembering that precisely the Wsja few weeks ago, asked the CTO of OpenAI, Mira Murati: “Was Sora trained on YouTube or Facebook data?”

And she replied: “I’m not sure”.

#OpenAI #transcribed #million #hours #YouTube #videos #train #ChatGpt
– 2024-04-09 11:59:59

Samsung Galaxy S23 Ultra Price and Specifications in Saudi Arabia - Noon Installment Plan

"Nvidia's RTX 4080 Super: A Small Performance Bump with an Intriguing Price Drop"

update of all Freeboxes, functionality being tested for some subscribers

Latest List of Realme and Samsung HP Prices for January 2022, Starting at IDR 1.5 Million

OpenAI has transcribed over a million hours of YouTube videos to train ChatGpt

The contents of the social network Reddit will be used to train AI: 60 million dollar agreement

Seven filmmakers tried OpenAI’s Sora with impressive results

Related posts:

Related

Leave a Comment Cancel reply

The contents of the social network Reddit will be used to train AI: 60 million dollar agreement

Seven filmmakers tried OpenAI’s Sora with impressive results

Related posts:

Share this:

Related

Leave a Comment Cancel reply