Home » today » Technology » Amazon reduces Alexa latency by a quarter by switching to its own Inferentia CPUs – Computer – News

Amazon reduces Alexa latency by a quarter by switching to its own Inferentia CPUs – Computer – News

Amazon now runs most of the searches submitted through Alexa voice assistant largely on its own Inferentia chips. The company wants to move away from Nvidia GPUs in its data centers and start using its own equipment.

Amazon writes in a blog post that it wants to use Alexa searches on its own chips for machine learning. This is done with the Elastic Compute Cloud Inf1 service, which runs on the Inferentia-chipset used in Amazon Web Services. The Inferentia chip is built specifically for AWS to accelerate machine learning. Inferentia chips have four NeuronCores and contain extra on-chip cache to make that process easy. According to Amazon, this ensures, among other things, a lower latency.

Amazon says that “the vast majority” of Alexa workloads now run on those Inferentia chips. That would have resulted in a 25 percent reduction in latency and a 30 percent reduction in costs. Until now, Amazon used Nvidia’s T4 GPUs to perform calculations, but the company wants to get rid of that in the long run.

Incidentally, that switch only concerns text-to-speech from Alexa commands. That was the only aspect of the technology behind the speech assistant that still ran on GPUs. Other parts of the calculations, including the Automatic Speech Recognition and the Natural Language Understanding, were already done on chips.

According to Amazon, the facial recognition program Rekognition is also being transferred to Inferentia chips. The latency would be eight times lower than with traditional GPU calculations. However, Amazon does not want to say which hardware was used for that first.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.