NTTS Speaking Styles - Amazon Polly

NTTS Speaking Styles

People use different speaking styles, depending on context. Casual conversation, for example, sounds very different from a TV or radio newscast. When Amazon Polly synthesizes speech using standard voices, it uses the concatenative method. The concatenative method strings together short speech snippets stored in an audio database to produce the optimal, most natural sounding speech possible. However, because of the way these voices are made, they can't produce different speaking styles.

In addition to the standard concatenative synthesis, Amazon Polly can use neural technology to produce speech. Amazon Polly generates neural voices using a sequence-to-sequence model. This model produces results that uses the audio data input to form the voice, and also considers its position in the sequence of outputs. It can then be used as a very natural voice as it is, or it can be trained for a specific speaking style, with the variations and emphasis on certain parts of speech inherent in that style.

Amazon Polly provides two speaking styles that you can use: Newscaster and Conversational.

The Newscaster style uses the neural system to generate speech in the style of a TV or radio newscaster. The Newscaster style is available with the Matthew and Joanna voices, in US English (en-US), the Lupe voice, in US Spanish (es-US), and the Amy voice in British English (en-GB).

The Conversational style uses the neural system to generate speech in a more friendly and expressive conversational style that can be used in many use cases. The Conversational style is available only for the Matthew and Joanna voices, available only in US English (en-US).