OpenAI has created a tool that can imitate a voice based on a fifteen-second fragment. The company has released samples of Voice Engine, but does not want to immediately make the entire model public.
OpenAI, the AI company that also makes ChatGPT, describes the tool in a blog post. The model is called Voice Engine and can read texts that a user provides as textual input. Based on an audio sample, OpenAI claims that the AI can completely mimic a voice, including intonation and emotion. Such a fragment only needs to last fifteen seconds, the company says.
The company does not disclose any data about the tool, nor is there a white paper or other technical description available. It is therefore not clear, for example, on which audio fragments Voice Engine was trained. OpenAI says to TechCrunch that it concerns a combination of licensed and publicly available data. According to the company, Voice Engine is not trained on user data. Samples that users create are also deleted afterwards.
According to TechCrunch, the tool should cost money in the future, although OpenAI does not say anything about this publicly. The company would charge $15 per million characters, or about 160,000 words that can be spoken, according to documents.
Voice Engine is not yet available to users, as is often the case with similar services these days. Last year, Meta showed Voicebox that can also generate spoken text based on short audio files, but the company does not make that tool available either. OpenAI says it is also being cautious about that now because of the implications. The tool could quickly be misused. OpenAI refers specifically to the US, where presidential elections will be held at the end of this year and the election battle has now begun.
The company has posted a number of examples on a blog showing what the tool can do. In addition, OpenAI tests Voice Engine with a limited number of testers. They had to sign a statement in advance stating that they will not generate texts without the permission of the person concerned. The tool will also have a watermark showing that the audio was generated and OpenAI says it ‘proactively monitors’ how the system is used. When the tool is released in the future, OpenAI also wants to create a list of voices that should not be cloned.
2024-03-29 19:46:34
#OpenAI #shows #texttospeech #mimic #voice #seconds #audio