Creating Long Audio Files (CLI) - Amazon Polly

Creating Long Audio Files (CLI)

Amazon Polly asynchronous synthesis functionality uses three SpeechSynthesisTask APIs to work with large amounts of text:

  • StartSpeechSynthesisTask: starts a new synthesis task.

  • GetSpeechSynthesisTask: returns details about a previously submitted synthesis task.

  • ListSpeechSynthesisTasks: lists all submitted synthesis tasks.

Synthesizing large amounts of text (StartSpeechSynthesisTask)

When you want to create an audio file larger than one that you can create with the real-time SynthesizeSpeech, use the StartSpeechSynthesisTask operation. In addition to the arguments needed for the SynthesizeSpeech operation, StartSpeechSynthesisTask also requires the name of an Amazon S3 bucket. Two other optional arguments are also available: a key prefix for the output file and the ARN for an SNS Topic if you want to receive status notification about the task.

  • OutputS3BucketName: The name of the Amazon S3 bucket where the synthesis should be uploaded. This bucket should be in the same region as the Amazon Polly service. Additionally, the IAM user being used to make the call should have access to the bucket. [Required]

  • OutputS3KeyPrefix: Key prefix for the output file. Use this parameter if you want to save the output speech file in a custom directory-like key in your bucket. [Optional]

  • SnsTopicArn: The SNS topic ARN to use if you want to receive notification about status of the task. This SNS topic should be in the same region as the Amazon Polly service. Additionally, the IAM user being used to make the call should have access to the topic. [Optional]

For example, the following example can be used to run the start-speech-synthesis-task AWS CLI command in the US East (Ohio) region:

The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full quotation marks (") around the input text with single quotes (') for interior tags.

aws polly start-speech-synthesis-task \ --region us-east-2 \ --endpoint-url "https://polly.us-east-2.amazonaws.com/" \ --output-format mp3 \ --output-s3-bucket-name your-bucket-name \ --output-s3-key-prefix optional/prefix/path/file \ --voice-id Joanna \ --text file://text_file.txt

This will result in a response that looks similar to this:

"SynthesisTask": { "OutputFormat": "mp3", "OutputUri": "https://s3.us-east-2.amazonaws.com/your-bucket-name/optional/prefix/path/file.<task_id>.mp3", "TextType": "text", "CreationTime": [..], "RequestCharacters": [..], "TaskStatus": "scheduled", "TaskId": [task_id], "VoiceId": "Joanna" }

The start-speech-synthesis-task operation returns several new fields:

  • OutputUri: the location of your output speech file.

  • TaskId: a unique identifier for the speech synthesis task generated by Amazon Polly.

  • CreationTime: a timestamp for when the task was initially submitted.

  • RequestCharacters: the number of billable characters in the task.

  • TaskStatus: provides information on the status of the submitted task.

    When your task is submitted, the initial status will show scheduled. When Amazon Polly starts processing the task, the status will change to inProgress and later, to completed or failed. If the task fails, an error message will be returned when calling either the GetSpeechSynthesisTask or ListSpeechSynthesisTasks operation.

When the task is completed, the speech file is available at the location specified in OutputUri.

Retrieving information on your speech synthesis task

You can get information on a task, such as errors, status, and so on, using the GetSpeechSynthesisTask operation. To do this, you will need the task-id returned by the StartSpeechSynthesisTask.

For example, the following example can be used to run the get-speech-synthesis-task AWS CLI command:

aws polly get-speech-synthesis-task \ --region us-east-2 \ --endpoint-url "https:// polly.us-east-2.amazonaws.com/" \ --task-id task identifier

You can also list all speech synthesis tasks that you've run in the current region using the ListSpeechSynthesisTasks operation.

For example, the following example can be used to run the list-speech-synthesis-tasks AWS CLI command:

aws polly list-speech-synthesis-tasks \ --region us-east-2 \ --endpoint-url "https:// polly.us-east-2.amazonaws.com/"