The monotonous and synthesized voices of virtual assistants seem to have their days numbered. THE NVIDIA developed a new artificial intelligence (IA) that reproduces an extremely realistic voice.
Using the combination of AI and human reference recordings, the “electronic voice” sounds almost identical to that of a real person. During the Interspeech 2021 event, the brand posted a video about the process of creating the “natural voice”.
The video showcases recent advances in the industry dedicated to researching NVIDIA voice technologies. In this project, the researchers used a version of the software open source NeMo optimized to run on brand video cards.
Experts equate speech with music, presenting complex rhythms, tones and timbres that are not simple to replicate. However, new tools are helping to reduce complexities.
Like machine learning, the AI is powered in two ways. First, a text-to-speech model of human-dictated speech is used. Then, the software is able to take excerpts from the passage and convert it into a female voice.
The second method is direct voice conversion. The tool takes an audio file of a person speaking and converts the voice into artificial intelligence, combining patterns and intonations.
New Nvidia AI can be applied to accessibility projects.Source: Nvidia/Disclosure
Narrator AI for the next NVIDIA series
Showing the high level of discovery, Nvidia’s AI will be the narrator of the video series I Am A.I. (I’m an AI, in free translation). The project will show the influence and the impactos do machine learning in various sectors.
The brand also wants to prove that the new technology has the potential to go much further. For example, the tool can help people with vocal disabilities or collaborate with users to translate between languages using their own voice.
–