What is Amazon Transcribe? - Amazon Transcribe

What is Amazon Transcribe?

Amazon Transcribe uses machine learning to recognize speech in audio and video files and transcribe that speech into text. Practical use cases for Amazon Transcribe include transcriptions of customer-agent calls and closed captions for videos.

Amazon Transcribe features

The following list highlights the Amazon Transcribe features that are available with all supported languages. There are several features that are only supported with specific languages; refer to Supported languages and language-specific features for more information.

  • Channel identification: Create a transcript for each audio channel or single stream of recorded sound in an audio file. For example, a phone conversation between two people consists of two separate audio channels. With channel identification, Amazon Transcribe returns two or more transcriptions: a combined transcription of all of the audio channels, and a transcription of each audio channel.

  • Custom vocabularies: Use a list of specific words you want Amazon Transcribe to recognize in your audio input. Custom vocabularies are often used for domain-specific terms or proper nouns that Amazon Transcribe isn't rendering correctly in your transcription output.

    Use custom vocabularies to:

    • Recognize industry-specific terms

    • Display acronyms correctly

    • Improve the accuracy of your transcription output

    See also: Custom language models

  • Language identification: Amazon Transcribe can automatically identify the predominant language in a media file without you having to specify a language code. You can also select several language codes to help Amazon Transcribe narrow down the predominant language for improved transcription accuracy.

    Amazon Transcribe also has the ability to transcribe accented speech of individuals who are non-native speakers of a language. For example, you can transcribe US English (en-US) audio spoken with a German (de-DE) accent.

  • Speaker diarization: Identify individual speakers in an audio clip—a technique known as speaker diarization. When you activate speaker diarization, Amazon Transcribe includes an attribute that identifies each speaker in the audio clip.

    Use speaker diarization to:

    • Identify the customer and the support representative in a recorded customer support call

    • Identify characters for closed captions

    • Identify the speaker and questioners in a recorded press conference or lecture

  • Subtitles: Create subtitles for your video files. You can use content redaction (only in US English) and vocabulary filters when generating subtitles.

    Use subtitles to:

    • Create closed captions for your video files

    • Filter out inappropriate content, such as profanity, from your subtitles (note that filtered or redacted content shows as whitespace, ***, or [PII] in your transcript and subtitle files, but the audio component is not altered).

  • Vocabulary filtering: Mask, remove, or tag words you don't want to appear in your transcription. Vocabulary filtering helps you filter for any word you consider profane, obscene, offensive, or otherwise unsuitable for display in your transcripts.

    Use vocabulary filtering to:

    • Generate family-friendly captions of a TV show

    • Remove proprietary terms from transcripts of conference proceedings


Not all Amazon Transcribe features are available in all languages; please review the Supported languages and language-specific features table before getting started.

Cross-service applications

You can use Amazon Transcribe with other AWS services to create applications. For example, you can:

  • Translate your audio into another language. Use Amazon Transcribe to convert voice to text, Amazon Translate to translate your text into another language, and Amazon Polly to generate audio from the translated text.

  • Use Amazon Transcribe to transcribe recordings of customer service calls for analysis. After transcribing a recording, send the transcription to Amazon Comprehend to identify keywords, topics, or sentiments.

  • Use Amazon Transcribe to transcribe live broadcasts, such as television, to provide real-time subtitles. Amazon Transcribe might require additional customization—such as using a custom language model or manual transcript correction—for broadcast-grade applications.