Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Speak

Focus mode
Speak - Amazon Chime SDK

You can play speech on any call leg by providing text. You can use plain text or Speech Synthesis Markup Language (SSML). SSML provides more control over how the Amazon Chime SDK generates speech by adding pauses, emphasizing certain words, or changing the speaking style.

The Amazon Chime SDK uses the Amazon Polly service to convert text-to-speech. Amazon Polly allows you to choose between either the standard or neural engine for improved speech quality. Amazon Polly supports more than 20 languages and 60 voices to customize your application's user experience. The Amazon Chime SDK provides speech features at no charge, but you do pay for using Amazon Polly. See the Amazon Polly pricing page or your billing dashboard for pricing information.

Important

Use of Amazon Polly is subject to the AWS Service Terms , including the terms specific to the AWS Machine Learning and Artificial Intelligence Services.

Using the Speak action

The following example shows a typical use of the Speak action.

{ "SchemaVersion": "1.0", "Actions":[ { "Type": "Speak", "Parameters": { "Text": "Hello, World!", // required "CallId": "call-id-1", // required "Engine": "neural", // optional. Defaults to standard "LanguageCode": "en-US", // optional "TextType": "text", // optional "VoiceId": "Joanna" // optional. Defaults to Joanna } } ] }
CallId

Description – The CallId of participant in the CallDetails of the Lambda function invocation

Allowed values – A valid call ID

Required – Yes

Default value – None

Text

Description – Specifies the input text to synthesize into speech. If you specify ssml as the TextType, follow the SSML format for the input text.

Allowed values – String

Required – Yes

Default value – None

Engine

Description – Specifies the engine—standard or neural—to use when processing text for speech synthesis.

Allowed values – standard | neural

Required – No

Default value – standard

LanguageCode

Description – Specifies the language code. Only necessary if using a bilingual voice. If you use a bilingual voice without a language code, the bilingual voice's default language is used.

Allowed values Amazon Polly language codes

Required – No

Default value – None

TextType

Description – Specifies the type of input text, plain text or SSML. If an input type is not specified, plain text is used as the default. For more information about SSML, see Generating Speech from SSML Documents in the Amazon Polly Developer Guide.

Allowed values – ssml | text

Required – No

Default value – None

VoiceId

Description – Specifies the ID of voice you want to use.

Allowed valuesAmazon Polly voice IDs

Required – No

Default value – Joanna

Handling ACTION_SUCCESSFUL events

The following example shows a typical ACTION_SUCCESSFUL event for an action which synthesizes the text "Hello World" into speech, in English, using the Amazon Polly's Joanna voice.

{ "SchemaVersion": "1.0", "Sequence": 3, "InvocationEventType": "ACTION_SUCCESSFUL", "ActionData": { "Type": "Speak", "Parameters": { "CallId": "call-id-1", "Engine": "neural", "LanguageCode": "en-US", "Text": "Hello World", "TextType": "text", "VoiceId": "Joanna" } }, "CallDetails":{ ... } }

Handling ACTION_FAILED events

The following example shows a typical ACTION_FAILED event for the same event used in the previous example.

{ "SchemaVersion": "1.0", "Sequence":2, "InvocationEventType": "ACTION_FAILED", "ActionData":{ "Type": "Speak", "Parameters": { "CallId": "call-id-1", "Engine": "neural", "LanguageCode": "en-US", "Text": "Hello World", "TextType": "text", "VoiceId": "Joanna" }, "ErrorType": "SystemException", "ErrorMessage": "System error while running action" }, "CallDetails":{ ... } }
Error handling

This table lists and describes the error messages thrown by the the Speak action.

Error Message Reason

AccessDenied

The AWSServiceRoleForAmazonChimeVoiceConnector service-linked role is not configured correctly.

The service-linked role used to make requests to Amazon Polly doesn't exist or is missing permissions. To resolve, see the steps in the Using the Amazon Chime SDK Voice Connector service-linked role section

InvalidActionParameter

 

There was an error validating the action parameters. See the SynthesizeSpeech API in the Amazon Polly Developer Guide for more information about parameters.

ActionExecutionThrottled Amazon Polly is throttling the request to synthesize speech. The request to Amazon Polly is returning a throttling exception. For more information about the Amazon Polly throttling limits, see https://docs.aws.amazon.com/polly/latest/dg/limits.html#limits-throttle .

MissingRequiredActionParameter

Text is a required parameter.

There action parameters must have a Text value

MissingRequiredActionParameter

Text is limited to 1,000 characters

The text exceeded the character limit.

SystemException

System error while running action.

A system error occurred while running the action.

Program flows

The following diagram shows the program flow that enables the Speak action for a caller. In this example, the caller hears text that

Diagram showing the program flow for enabling the Speak action for a caller.
In the diagram

Using a soft phone, a caller enters a number registered to a SIP media application. The application uses the SIP INVITE method and sends the caller a Trying (100) response. That indicates that the next-hop server received the call request. The SIP application then uses INVITE to contact the endpoint. Once the connection is established, the applications sends Ringing (180) response to the caller, and alerting begins.

The SIP media application then sends a NEW_INBOUND_CALL event to the Lambda function, which responds with a Speak action that includes the caller's ID and the text that you want to convert into speech. The SIP application then sends a 200 (OK) response to indicate that the call was answered. The protocol also enables the media.

If the Speak action succeeds and converts the text to speech, it returns an ACTION_SUCCESSFUL event to the SIP media application, which returns the next set of actions. If the action fails, the SIP media application sends an ACTION_FAILED event to the Lambda function, which responds with a set of Hangup actions. The application hangs up the caller and returns a HANGUP event to the Lambda function, which takes no further actions.

The following diagram shows the program flow than enables the Speak action for a callee.

Diagram showing the program flow for enabling the Speak action for a callee. You can do this on any bridged call.
In the diagram

A caller enters a number registered to a SIP media application, and the application responds as described for the previous diagram. When the Lambda function receives the NEW_INBOUND_CALL event, it returns the CallAndBridge action to the SIP application. The application then uses the SIP INVITE method to send the Trying (100) and Ringing (180) responses to the callee.

If the callee answers, the SIP media application recieves a 200 (OK) response, and it sends the same response to the caller. That establishes media, and the SIP application sends an ACTION_SUCCESSFUL event for the CallAndBridge action to the Lambda function. The function then returns the Speak action and data to the SIP application, which converts

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.