Amazon Sumerian Speech Component - Amazon Sumerian

Amazon Sumerian Speech Component

The speech component assigns text to an entity for playback using a native integration with Amazon Polly. You assign text to an entity, and play the audio output from Amazon Polly with a state machine or script. The scene calls Amazon Polly at runtime to generate the audio.

To use Amazon Polly during playback, the scene needs AWS credentials from Amazon Cognito Identity. Create an identity pool for your scene, and configure it under AWS configuration in the scene settings.

To add a speech, click the + button.

      add speech to speech component

This will open the Text Editor. In the text editor, type out a speech using plaintext or you can use SSML markup to customize your speech. Make sure to click Save before returning to the Sumerian editor.

        using the text editor for speech creation

By default, Sumerian will use the Standard voice engine from Amazon Polly. To select a different voice engine, click the Voice Engine drop down menu and select Neural. To learn more about Amazon Polly's Neural TTS and region compatibility, visit the Amazon Polly's Neural TTS documentation. Also be sure to check out the NTTS Newscaster Style to see which voices support the Newscaster voice style.

        Amazon Polly neural voice engine

In order to access these features, your scene must be updated and re-published with AWS SDK for JavaScript version 2.503 or higher. Scenes created prior to Sumerian Release 0.26 will need to be manually updated, while newly-created scenes will not require these steps.

To update the AWS SDK for JavaScript:

  1. In the Entities panel, select the scene node at the top of the entity hierarchy.

                The Sumerian entities panel with root entity selected
  2. Navigate to the Inspector panel expand them AWS Configuration component.

                        The Sumerian inspector panel with aws configurtion component
  3. Ensure that the version listed is at least 2.503 or higher (e.g., ).

  4. Select the scene menu and save your scene.

  5. Refresh your browser and verify that the new SDK is displayed.

  6. Re-publish your scene.


  • 3D audio – Adjust the volume of the speech audio based on the distance of the entity from the camera.

  • Voice – An Amazon Polly voice.

  • Voice Engine – Choose between Amazon Polly's Standard voice engine or the Neural voice engine

  • Volume – Volume of the speech audio.

  • Speech files – Drop text files here to add them to the component. Click to mark up a speech file with gestures.

  • Gesture map – A document that maps gestures to words. When you mark up a speech file, the editor uses this mapping to determine where to add gestures. You can modify the gesture map using text editor.

To trigger a speech during playback, use a state machine behavior or script component on the same entity.


Gestures is a feature available when using a Sumerian Host. You can automate hand and body gestures based on speech.

To add gestures, expand the Gesture Map property and click the + button. This will add a gesture map to your scene, enabling gestures to be referenced in your speech file.

This will add a DefaultGestureMap to your scene and will automatically open up the file in the Text Editor. The DefaultGestureMap lists out all the available Host gestures. It also lists the words that will trigger the usage of that gesture in automatic SSML generation.

Return to the Sumerian editor. You will see that the DefaultGestureMap has been added to the Gesture Map property. You can return to the text editor and add SSML markup manually. See the Amazon Polly SSML documentation. Alternatively, you can automatically add gestures and SSML markup by clicking the Auto-generate gesture marks button next to a Speech you previousl created.

Return to the text editor and open up your Speech file. Notice that your Speech has been marked up with SSML. Additionally, gesture tags have been added next to words that matched the trigger words in the DefaultGestureMap. The wave gesture was added next to the word "Hello". The self gesture was added next to the word "my", "self", and "I". Make sure to click Save to save these additions in your Speech.

See the scene below to see a sample of the available gestures.

In the next sections, you will learn how to start your speech using both the State Machine and Script components.

State Machine

To play a speech, add a state machine component to the entity with the speech component. Add a state with the AWS SDK ready and Start Speech actions.


To play a speech using a script component, get a reference to the speech component from the context object. The component has a speeches array that contains the speeches attached to the component. Call play on a speech.

Sumerian calls Amazon Polly when you play a speech, so you must use the aws.sdkReady listener to ensure that your scene's AWS credentials are loaded before the call. Note that the following script is using the Legacy API.

Example script – play a random speech

'use strict'; var setup = function(args, ctx) { sumerian.SystemBus.addListener('aws.sdkReady', () => { var speechComponent = ctx.entity.getComponent("speechComponent"); var speeches = speechComponent.speeches; var speech = speeches[Math.floor(Math.random() * speeches.length)];; }, true ); };

For more information, see the Using the Host and Speech Components tutorial.