Using the Speak action Handling ACTION_SUCCESSFUL events Handling ACTION_FAILED events Program flows

Speak

You can play speech on any call leg by providing text. You can use plain text or Speech Synthesis Markup Language (SSML). SSML provides more control over how the Amazon Chime SDK generates speech by adding pauses, emphasizing certain words, or changing the speaking style.

The Amazon Chime SDK uses the Amazon Polly service to convert text-to-speech. Amazon Polly allows you to choose between either the standard or neural engine for improved speech quality. Amazon Polly supports more than 20 languages and 60 voices to customize your application's user experience. The Amazon Chime SDK provides speech features at no charge, but you do pay for using Amazon Polly. See the Amazon Polly pricing page or your billing dashboard for pricing information.

Important

Use of Amazon Polly is subject to the AWS Service Terms , including the terms specific to the AWS Machine Learning and Artificial Intelligence Services.

Using the Speak action

The following example shows a typical use of the Speak action.


{
    "SchemaVersion": "1.0",
    "Actions":[
        {
            "Type": "Speak",
            "Parameters": {
                "Text": "Hello, World!",        // required
                "CallId": "call-id-1",          // required
                "Engine": "neural",             // optional. Defaults to standard
                "LanguageCode": "en-US",        // optional
                "TextType": "text",             // optional
                "VoiceId": "Joanna"             // optional. Defaults to Joanna
            }
        }
    ]
}

CallId

Description – The CallId of participant in the CallDetails of the Lambda function invocation

Allowed values – A valid call ID

Required – Yes

Default value – None

Text

Description – Specifies the input text to synthesize into speech. If you specify ssml as the TextType, follow the SSML format for the input text.

Allowed values – String

Required – Yes

Default value – None

Engine

Description – Specifies the engine—standard or neural—to use when processing text for speech synthesis.

Allowed values – standard | neural

Required – No

Default value – standard

LanguageCode

Description – Specifies the language code. Only necessary if using a bilingual voice. If you use a bilingual voice without a language code, the bilingual voice's default language is used.

Allowed values – Amazon Polly language codes

Required – No

Default value – None

TextType

Description – Specifies the type of input text, plain text or SSML. If an input type is not specified, plain text is used as the default. For more information about SSML, see Generating Speech from SSML Documents in the Amazon Polly Developer Guide.

Allowed values – ssml | text

Required – No

Default value – None

VoiceId

Description – Specifies the ID of voice you want to use.

Allowed values – Amazon Polly voice IDs

Required – No

Default value – Joanna

Handling ACTION_SUCCESSFUL events

The following example shows a typical ACTION_SUCCESSFUL event for an action which synthesizes the text "Hello World" into speech, in English, using the Amazon Polly's Joanna voice.


{
    "SchemaVersion": "1.0",
    "Sequence": 3,
    "InvocationEventType": "ACTION_SUCCESSFUL",
    "ActionData": {
       "Type": "Speak",
       "Parameters": {
          "CallId": "call-id-1",          
          "Engine":  "neural",             
          "LanguageCode":  "en-US",        
          "Text": "Hello World",          
          "TextType":  "text",             
          "VoiceId":  "Joanna"        
       }
    },
    "CallDetails":{       
       ...
    }
}

Handling ACTION_FAILED events

The following example shows a typical ACTION_FAILED event for the same event used in the previous example.


{
    "SchemaVersion": "1.0",
    "Sequence":2,
    "InvocationEventType": "ACTION_FAILED",
    "ActionData":{
       "Type": "Speak",
       "Parameters": {
          "CallId": "call-id-1",          
          "Engine":  "neural",             
          "LanguageCode":  "en-US",        
          "Text": "Hello  World",          
          "TextType":  "text",             
          "VoiceId":  "Joanna"        
       },
       "ErrorType": "SystemException",
       "ErrorMessage": "System error while running  action"
    },
    "CallDetails":{       
       ...
    }
}

Error handling

This table lists and describes the error messages thrown by the the Speak action.

Error	Message	Reason
`AccessDenied`	The `AWSServiceRoleForAmazonChimeVoiceConnector` service-linked role is not configured correctly.	The service-linked role used to make requests to Amazon Polly doesn't exist or is missing permissions. To resolve, see the steps in the Using the Amazon Chime SDK Voice Connector service-linked role section
`InvalidActionParameter`		There was an error validating the action parameters. See the SynthesizeSpeech API in the Amazon Polly Developer Guide for more information about parameters.
`ActionExecutionThrottled`	Amazon Polly is throttling the request to synthesize speech.	The request to Amazon Polly is returning a throttling exception. For more information about the Amazon Polly throttling limits, see https://docs.aws.amazon.com/polly/latest/dg/limits.html#limits-throttle .
`MissingRequiredActionParameter`	`Text` is a required parameter.	There action parameters must have a `Text` value
`MissingRequiredActionParameter`	`Text` is limited to 1,000 characters	The text exceeded the character limit.
`SystemException`	System error while running action.	A system error occurred while running the action.

Program flows

The following diagram shows the program flow that enables the Speak action for a caller. In this example, the caller hears text that

Diagram showing the program flow for enabling the Speak action for a caller.

In the diagram

Using a soft phone, a caller enters a number registered to a SIP media application. The application uses the SIP INVITE method and sends the caller a Trying (100) response. That indicates that the next-hop server received the call request. The SIP application then uses INVITE to contact the endpoint. Once the connection is established, the applications sends Ringing (180) response to the caller, and alerting begins.

The SIP media application then sends a NEW_INBOUND_CALL event to the Lambda function, which responds with a Speak action that includes the caller's ID and the text that you want to convert into speech. The SIP application then sends a 200 (OK) response to indicate that the call was answered. The protocol also enables the media.

If the Speak action succeeds and converts the text to speech, it returns an ACTION_SUCCESSFUL event to the SIP media application, which returns the next set of actions. If the action fails, the SIP media application sends an ACTION_FAILED event to the Lambda function, which responds with a set of Hangup actions. The application hangs up the caller and returns a HANGUP event to the Lambda function, which takes no further actions.

The following diagram shows the program flow than enables the Speak action for a callee.

Diagram showing the program flow for enabling the Speak action for a callee. You can do this on any bridged call.

In the diagram

A caller enters a number registered to a SIP media application, and the application responds as described for the previous diagram. When the Lambda function receives the NEW_INBOUND_CALL event, it returns the CallAndBridge action to the SIP application. The application then uses the SIP INVITE method to send the Trying (100) and Ringing (180) responses to the callee.

If the callee answers, the SIP media application recieves a 200 (OK) response, and it sends the same response to the caller. That establishes media, and the SIP application sends an ACTION_SUCCESSFUL event for the CallAndBridge action to the Lambda function. The function then returns the Speak action and data to the SIP application, which converts

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

SendDigits

SpeakAndGetDigits