You can play speech on any call leg by providing text. You can use plain text or Speech Synthesis Markup Language (SSML). SSML provides more control over how the Amazon Chime SDK generates speech by adding pauses, emphasizing certain words, or changing the speaking style.
The Amazon Chime SDK uses the Amazon Polly service to convert text-to-speech. Amazon Polly allows you to choose between
either the standard or neural engine for improved speech quality. Amazon Polly supports more than 20
languages and 60 voices to customize your application's user experience. The Amazon Chime SDK provides speech
features at no charge, but you do pay for using Amazon Polly. See the Amazon Polly pricing page
Important
Use of Amazon Polly is subject to the
AWS Service Terms
Topics
Using the Speak action
The following example shows a typical use of the Speak
action.
{
"SchemaVersion": "1.0",
"Actions":[
{
"Type": "Speak",
"Parameters": {
"Text": "Hello, World!
", // required
"CallId": "call-id-1
", // required
"Engine": "neural
", // optional. Defaults to standard
"LanguageCode": "en-US
", // optional
"TextType": "text
", // optional
"VoiceId": "Joanna
" // optional. Defaults to Joanna
}
}
]
}
- CallId
-
Description – The
CallId
of participant in theCallDetails
of the Lambda function invocationAllowed values – A valid call ID
Required – Yes
Default value – None
- Text
-
Description – Specifies the input text to synthesize into speech. If you specify
ssml
as theTextType
, follow the SSML format for the input text.Allowed values – String
Required – Yes
Default value – None
- Engine
-
Description – Specifies the engine—standard or neural—to use when processing text for speech synthesis.
Allowed values – standard | neural
Required – No
Default value – standard
- LanguageCode
-
Description – Specifies the language code. Only necessary if using a bilingual voice. If you use a bilingual voice without a language code, the bilingual voice's default language is used.
Allowed values – Amazon Polly language codes
Required – No
Default value – None
- TextType
-
Description – Specifies the type of input text, plain text or SSML. If an input type is not specified, plain text is used as the default. For more information about SSML, see Generating Speech from SSML Documents in the Amazon Polly Developer Guide.
Allowed values – ssml | text
Required – No
Default value – None
- VoiceId
-
Description – Specifies the ID of voice you want to use.
Allowed values – Amazon Polly voice IDs
Required – No
Default value – Joanna
Handling ACTION_SUCCESSFUL events
The following example shows a typical ACTION_SUCCESSFUL
event for an action
which synthesizes the text "Hello World" into speech, in English, using the Amazon Polly's
Joanna
voice.
{
"SchemaVersion": "1.0",
"Sequence": 3
,
"InvocationEventType": "ACTION_SUCCESSFUL",
"ActionData": {
"Type": "Speak",
"Parameters": {
"CallId": "call-id-1
",
"Engine": "neural
",
"LanguageCode": "en-US
",
"Text": "Hello World
",
"TextType": "text
",
"VoiceId": "Joanna
"
}
},
"CallDetails":{
...
}
}
Handling ACTION_FAILED events
The following example shows a typical ACTION_FAILED
event for the same event
used in the previous example.
{
"SchemaVersion": "1.0",
"Sequence":2
,
"InvocationEventType": "ACTION_FAILED",
"ActionData":{
"Type": "Speak",
"Parameters": {
"CallId": "call-id-1
",
"Engine": "neural
",
"LanguageCode": "en-US
",
"Text": "Hello World
",
"TextType": "text
",
"VoiceId": "Joanna
"
},
"ErrorType": "SystemException",
"ErrorMessage": "System error while running action"
},
"CallDetails":{
...
}
}
Error handling
This table lists and describes the error messages thrown by the the Speak
action.
Error | Message | Reason |
---|---|---|
|
The |
The service-linked role used to make requests to Amazon Polly doesn't exist or is missing permissions. To resolve, see the steps in the Using the Amazon Chime SDK Voice Connector service-linked role section |
|
|
There was an error validating the action parameters. See the SynthesizeSpeech API in the Amazon Polly Developer Guide for more information about parameters. |
ActionExecutionThrottled |
Amazon Polly is throttling the request to synthesize speech. | The request to Amazon Polly is returning a throttling exception. For more information about the Amazon Polly throttling limits, see https://docs.aws.amazon.com/polly/latest/dg/limits.html#limits-throttle . |
|
|
There action parameters must have a |
|
|
The text exceeded the character limit. |
|
System error while running action. |
A system error occurred while running the action. |
Program flows
The following diagram shows the program flow that enables the Speak
action for
a caller. In this example, the caller hears text that
In the diagram
Using a soft phone, a caller enters a number registered to a SIP media application. The
application uses the SIP INVITE
method and sends the caller a Trying
(100)
response. That indicates that the next-hop server received the call request. The
SIP application then uses INVITE
to contact the endpoint. Once the connection is
established, the applications sends Ringing (180)
response to the caller, and
alerting begins.
The SIP media application then sends a NEW_INBOUND_CALL
event to the Lambda
function, which responds with a Speak
action that includes the caller's ID and the
text that you want to convert into speech. The SIP application then sends a 200 (OK)
response to indicate that the call was answered. The protocol also enables the media.
If the Speak
action succeeds and converts the text to speech, it returns an
ACTION_SUCCESSFUL
event to the SIP media application, which returns the next set of
actions. If the action fails, the SIP media application sends an ACTION_FAILED
event
to the Lambda function, which responds with a set of Hangup
actions. The application
hangs up the caller and returns a HANGUP
event to the Lambda function, which takes
no further actions.
The following diagram shows the program flow than enables the Speak
action for
a callee.
In the diagram
A caller enters a number registered to a SIP media application, and the application
responds as described for the previous diagram. When the Lambda function receives the
NEW_INBOUND_CALL
event, it returns the CallAndBridge action to
the SIP application. The application then uses the SIP INVITE
method to send the
Trying (100)
and Ringing (180)
responses to the callee.
If the callee answers, the SIP media application recieves a 200 (OK)
response,
and it sends the same response to the caller. That establishes media, and the SIP application
sends an ACTION_SUCCESSFUL
event for the CallAndBridge action to
the Lambda function. The function then returns the Speak action and data to the SIP application,
which converts