A brand is a fictional person. Like people, it has many unique characteristics, including sound. The sound of a brand can help users identify the personality of the brand immediately through hearing. Today, Amazon Polly, Amazon’s cloud service, launched the voice of brands, a fully automated service. The service can transform the text content into realistic voice, and provide special customized voice service for customers.
As Rafal Kuklinski, head of AI voice at Amazon, and Ankit dhawan, senior product manager, explained in a blog post, “voice of brands” allows companies to differentiate between other brands by incorporating unique voice features into their products and services. “Every company can have its own unique sound brand.” They wrote.
Amazon has teamed up with KFC to implant an English accent from the south of the United States into the KFC’s brand logo “grandfather KFC” and launch it on Amazon’s Alexa app. In addition, it has designed Australian English voice for National Australia Bank, which has moved its contact center to Amazon connect.
At the end of last year, Amazon introduced in detail its work in using AI to generate speech (“the impact of data simplification effect on text to speech conversion”) in a research paper, in which the researchers described a system that only needs a few hours of training to learn a new language style. For the same purpose, dubbing actors can take dozens of hours.
Amazon’s AI model consists of two parts. The first is neural network, which can transform the phoneme sequence into the spectrogram sequence. The change of sound with time makes the spectrum can be clearly observed by the naked eye. The second is a vocoder, which converts the spectrogram into a continuous audio signal. This training method of artificial intelligence model combines a large number of neutral style voice data with the required style data and an AI system that can distinguish voice. Amazon has used it internally to generate new sounds for Alexa.
This technology has good commercial value. The task of brand voice (for example, the role FIO played by actress Stephanie Courtney) is usually to record phone trees for interactive voice response systems or e-learning scripts for corporate training videos. Synthesizer can improve the efficiency of actors by reducing the number of auxiliary recording and answering, and at the same time make them free time for creative work.
Amazon and Google stand out in this area with the voice of brands and other services that transform text into voice. Google recently launched 31 AI synthesized WaveNet voice and 24 new cloud text to voice service standard voice. In addition, Amazon has another noteworthy competitor Microsoft, which provides three kinds of AI generated preview voice and 75 kinds of standard voice through azure voice service API.
Amazon’s “voice of brands” also competes with the products of a number of start-ups, such as voicery, which provides customized digital sound that sounds impressive, much like the human voice. Ispeech, a text to speech technology start-up, has similar voice tools, as do modular, respeecher, reset AI, describe and deepsync in Bangalore, India.