Amazon Polly
Developer Guide

Speech Mark Examples

The following examples of speech mark requests show how to make common requests and the output that they generate.

Example 1: Speech Marks Without SSML

The following example shows you what requested metadata looks like on your screen for the simple sentence: "Mary had a little lamb." For simplicity, we don't include SSML speech marks in this example.

The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full quotation marks (") around the input text with single quotes (') for interior tags.

aws polly synthesize-speech \ --output-format json \ --voice-id Joanna \ --text 'Mary had a little lamb.' \ --speech-mark-types='["viseme", "word", "sentence"]' \ MaryLamb.txt

When you make this request, Amazon Polly returns the following in the .txt file:

{"time":0,"type":"sentence","start":0,"end":23,"value":"Mary had a little lamb."} {"time":6,"type":"word","start":0,"end":4,"value":"Mary"} {"time":6,"type":"viseme","value":"p"} {"time":73,"type":"viseme","value":"E"} {"time":180,"type":"viseme","value":"r"} {"time":292,"type":"viseme","value":"i"} {"time":373,"type":"word","start":5,"end":8,"value":"had"} {"time":373,"type":"viseme","value":"k"} {"time":460,"type":"viseme","value":"a"} {"time":521,"type":"viseme","value":"t"} {"time":604,"type":"word","start":9,"end":10,"value":"a"} {"time":604,"type":"viseme","value":"@"} {"time":643,"type":"word","start":11,"end":17,"value":"little"} {"time":643,"type":"viseme","value":"t"} {"time":739,"type":"viseme","value":"i"} {"time":769,"type":"viseme","value":"t"} {"time":799,"type":"viseme","value":"t"} {"time":882,"type":"word","start":18,"end":22,"value":"lamb"} {"time":882,"type":"viseme","value":"t"} {"time":964,"type":"viseme","value":"a"} {"time":1082,"type":"viseme","value":"p"}

In this output, each part of the text is broken out in terms of speech marks:

  • The sentence "Mary had a little lamb."

  • Each word in the text: "Mary", "had", "a", "little", and "lamb."

  • The viseme for each sound in the corresponding audio stream: "p", "E", "r", "i", and so on. For more information on visemes see Visemes and Amazon Polly.

Example 2: Speech Marks with SSML

The process of generating speech marks from SSML-enhanced text is similar to the process when SSML is not present. Use the synthesize-speech command, and specify the SSML-enhanced text and the type of speech marks that you want, as shown in the following example. To make the example easier to read, we do not include viseme speech marks, but these could be included as well.

The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full quotation marks (") around the input text with single quotes (') for interior tags.

aws polly synthesize-speech \ --output-format json \ --voice-id Joanna \ --text-type ssml \ --text '<speak><prosody volume="+20dB">Mary had <break time="300ms"/>a little <mark name="animal"/>lamb</prosody></speak>' \ --speech-mark-types='["sentence", "word", "ssml"]' \ output.txt

When you make this request, Amazon Polly returns the following in the .txt file:

{"time":0,"type":"sentence","start":31,"end":95,"value":"Mary had <break time=\"300ms\"\/>a little <mark name=\"animal\"\/>lamb"} {"time":6,"type":"word","start":31,"end":35,"value":"Mary"} {"time":325,"type":"word","start":36,"end":39,"value":"had"} {"time":897,"type":"word","start":40,"end":61,"value":"<break time=\"300ms\"\/>"} {"time":1291,"type":"word","start":61,"end":62,"value":"a"} {"time":1373,"type":"word","start":63,"end":69,"value":"little"} {"time":1635,"type":"ssml","start":70,"end":91,"value":"animal"} {"time":1635,"type":"word","start":91,"end":95,"value":"lamb"}