StartStreamTranscription
Starts a bidirectional HTTP/2 or WebSocket stream where audio is streamed to Amazon Transcribe and the transcription results are streamed to your application.
The following are encoded as headers:
-
language-code
-
media-encoding
-
sample-rate
-
session-id
For more information on streaming with Amazon Transcribe, see Transcribing streaming audio.
Request Syntax
POST /stream-transcription HTTP/2
x-amzn-transcribe-language-code: LanguageCode
x-amzn-transcribe-sample-rate: MediaSampleRateHertz
x-amzn-transcribe-media-encoding: MediaEncoding
x-amzn-transcribe-vocabulary-name: VocabularyName
x-amzn-transcribe-session-id: SessionId
x-amzn-transcribe-vocabulary-filter-name: VocabularyFilterName
x-amzn-transcribe-vocabulary-filter-method: VocabularyFilterMethod
x-amzn-transcribe-show-speaker-label: ShowSpeakerLabel
x-amzn-transcribe-enable-channel-identification: EnableChannelIdentification
x-amzn-transcribe-number-of-channels: NumberOfChannels
x-amzn-transcribe-enable-partial-results-stabilization: EnablePartialResultsStabilization
x-amzn-transcribe-partial-results-stability: PartialResultsStability
x-amzn-transcribe-content-identification-type: ContentIdentificationType
x-amzn-transcribe-content-redaction-type: ContentRedactionType
x-amzn-transcribe-pii-entity-types: PiiEntityTypes
x-amzn-transcribe-language-model-name: LanguageModelName
x-amzn-transcribe-identify-language: IdentifyLanguage
x-amzn-transcribe-language-options: LanguageOptions
x-amzn-transcribe-preferred-language: PreferredLanguage
x-amzn-transcribe-vocabulary-names: VocabularyNames
x-amzn-transcribe-vocabulary-filter-names: VocabularyFilterNames
Content-type: application/json
{
"AudioStream": {
"AudioEvent": {
"AudioChunk": blob
}
}
}
URI Request Parameters
The request uses the following URI parameters.
- ContentIdentificationType
-
Labels all personally identifiable information (PII) identified in your transcript.
Content identification is performed at the segment level; PII specified in
PiiEntityTypes
is flagged upon complete transcription of an audio segment.You can’t set both
ContentIdentificationType
andContentRedactionType
in the same request. If you set both, your request returns aBadRequestException
.For more information, see Redacting or identifying personally identifiable information.
Valid Values:
PII
- ContentRedactionType
-
Redacts all personally identifiable information (PII) identified in your transcript.
Content redaction is performed at the segment level; PII specified in
PiiEntityTypes
is redacted upon complete transcription of an audio segment.You can’t set both
ContentRedactionType
andContentIdentificationType
in the same request. If you set both, your request returns aBadRequestException
.For more information, see Redacting or identifying personally identifiable information.
Valid Values:
PII
- EnableChannelIdentification
-
Enables channel identification in multi-channel audio.
Channel identification transcribes the audio on each channel independently, then appends the output for each channel into one transcript.
If you have multi-channel audio and do not enable channel identification, your audio is transcribed in a continuous manner and your transcript is not separated by channel.
You can't set both
ShowSpeakerLabel
andEnableChannelIdentification
in the same request. If you set both, your request returns aBadRequestException
.For more information, see Transcribing multi-channel audio.
- EnablePartialResultsStabilization
-
Enables partial result stabilization for your transcription.
Partial result stabilization returns transcription results as soon as your audio is transcribed; this differs from a standard streaming transcription, which transcribes your stream at a segment level. Note that partial results stabilization can impact transcription accuracy.
For more information, see Partial-result stabilization.
- IdentifyLanguage
-
Enables automatic language identification for your transcription.
If you include
IdentifyLanguage
, you can optionally include a list of language codes, usingLanguageOptions
, that you think may be present in your audio stream. Including language options can improve transcription accuracy.You can also include a preferred language using
PreferredLanguage
. Adding a preferred language can help Amazon Transcribe identify the language faster than if you omit this parameter.If you have multi-channel audio that contains different languages on each channel, and you've enabled channel identification, automatic language identification identifies the dominant language on each audio channel.
Note that you must include either
LanguageCode
orIdentifyLanguage
in your request. If you include both parameters, your request fails.Streaming language identification can't be combined with custom language models or redaction.
- LanguageCode
-
Specify the language code that represents the language spoken in your audio.
If you're unsure of the language spoken in your audio, consider using
IdentifyLanguage
to enable automatic language identification.For a list of languages supported with Amazon Transcribe streaming, refer to the Supported languages table.
Valid Values:
en-US | en-GB | es-US | fr-CA | fr-FR | en-AU | it-IT | de-DE | pt-BR | ja-JP | ko-KR | zh-CN
- LanguageModelName
-
Specify the name of the custom language model you want to use when processing your transcription. Note that language model names are case sensitive.
The language of the specified language model must match the language code you specify in your transcription request. If the languages don't match, the language model isn't applied. There are no errors or warnings associated with a language mismatch.
For more information, see Custom language models.
Length Constraints: Minimum length of 1. Maximum length of 200.
Pattern:
^[0-9a-zA-Z._-]+
- LanguageOptions
-
Specify two or more language codes that represent the languages you think may be present in your media; including more than five is not recommended. If you're unsure what languages are present, do not include this parameter.
Including language options can improve the accuracy of language identification.
If you include
LanguageOptions
in your request, you must also includeIdentifyLanguage
.For a list of languages supported with Amazon Transcribe streaming, refer to the Supported languages table.
Important You can only include one language dialect per language per stream. For example, you cannot include both
en-US
anden-AU
in the same request.Length Constraints: Minimum length of 1. Maximum length of 200.
Pattern:
^[a-zA-Z-,]+
- MediaEncoding
-
Specify the encoding used for the input audio. Supported formats are:
-
FLAC
-
OPUS-encoded audio in an Ogg container
-
PCM (only signed 16-bit little-endian audio formats, which does not include WAV)
For more information, see Media formats.
Valid Values:
pcm | ogg-opus | flac
Required: Yes
-
- MediaSampleRateHertz
-
The sample rate of the input audio (in Hertz). Low-quality audio, such as telephone audio, is typically around 8,000 Hz. High-quality audio typically ranges from 16,000 Hz to 48,000 Hz. Note that the sample rate you specify must match that of your audio.
Valid Range: Minimum value of 8000. Maximum value of 48000.
Required: Yes
- NumberOfChannels
-
Specify the number of channels in your audio stream. Up to 2 channels are supported.
Valid Range: Minimum value of 2.
- PartialResultsStability
-
Specify the level of stability to use when you enable partial results stabilization (
EnablePartialResultsStabilization
).Low stability provides the highest accuracy. High stability transcribes faster, but with slightly lower accuracy.
For more information, see Partial-result stabilization.
Valid Values:
high | medium | low
- PiiEntityTypes
-
Specify which types of personally identifiable information (PII) you want to redact in your transcript. You can include as many types as you'd like, or you can select
ALL
.To include
PiiEntityTypes
in your request, you must also include eitherContentIdentificationType
orContentRedactionType
.Values must be comma-separated and can include:
BANK_ACCOUNT_NUMBER
,BANK_ROUTING
,CREDIT_DEBIT_NUMBER
,CREDIT_DEBIT_CVV
,CREDIT_DEBIT_EXPIRY
,PIN
,EMAIL
,ADDRESS
,NAME
,PHONE
,SSN
, orALL
.Length Constraints: Minimum length of 1. Maximum length of 300.
Pattern:
^[A-Z_, ]+
- PreferredLanguage
-
Specify a preferred language from the subset of languages codes you specified in
LanguageOptions
.You can only use this parameter if you've included both
IdentifyLanguage
andLanguageOptions
in your request.Valid Values:
en-US | en-GB | es-US | fr-CA | fr-FR | en-AU | it-IT | de-DE | pt-BR | ja-JP | ko-KR | zh-CN
- SessionId
-
Specify a name for your transcription session. If you don't include this parameter in your request, Amazon Transcribe generates an ID and returns it in the response.
You can use a session ID to retry a streaming session.
Length Constraints: Fixed length of 36.
Pattern:
[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}
- ShowSpeakerLabel
-
Enables speaker identification (diarization) in your transcription output. Speaker identification labels the speech from individual speakers in your media file.
For more information, see Identifying speakers (diarization).
- VocabularyFilterMethod
-
Specify how you want your vocabulary filter applied to your transcript.
To replace words with
***
, choosemask
.To delete words, choose
remove
.To flag words without changing them, choose
tag
.Valid Values:
remove | mask | tag
- VocabularyFilterName
-
Specify the name of the custom vocabulary filter you want to use when processing your transcription. Note that vocabulary filter names are case sensitive.
If the language of the specified custom vocabulary filter doesn't match the language identified in your media, your job fails.
Important This parameter is not intended for use in conjunction with the
IdentifyLanguage
parameter. If you're includingIdentifyLanguage
in your request and want to use one or more vocabulary filters with your transcription, use theVocabularyFilterNames
parameter instead.For more information, see Using vocabulary filtering with unwanted words.
Length Constraints: Minimum length of 1. Maximum length of 200.
Pattern:
^[0-9a-zA-Z._-]+
- VocabularyFilterNames
-
Specify the names of the custom vocabulary filters you want to use when processing your transcription. Note that vocabulary filter names are case sensitive.
If none of the languages of the specified custom vocabulary filters match the language identified in your media, your job fails.
Important This parameter is only intended for use in conjunction with the
IdentifyLanguage
parameter. If you're not includingIdentifyLanguage
in your request and want to use a custom vocabulary filter with your transcription, use theVocabularyFilterName
parameter instead.For more information, see Using vocabulary filtering with unwanted words.
Length Constraints: Minimum length of 1. Maximum length of 3000.
Pattern:
^[a-zA-Z0-9,-._]+
- VocabularyName
-
Specify the name of the custom vocabulary you want to use when processing your transcription. Note that vocabulary names are case sensitive.
If the language of the specified custom vocabulary doesn't match the language identified in your media, your job fails.
Important This parameter is not intended for use in conjunction with the
IdentifyLanguage
parameter. If you're includingIdentifyLanguage
in your request and want to use one or more custom vocabularies with your transcription, use theVocabularyNames
parameter instead.For more information, see Custom vocabularies.
Length Constraints: Minimum length of 1. Maximum length of 200.
Pattern:
^[0-9a-zA-Z._-]+
- VocabularyNames
-
Specify the names of the custom vocabularies you want to use when processing your transcription. Note that vocabulary names are case sensitive.
If none of the languages of the specified custom vocabularies match the language identified in your media, your job fails.
Important This parameter is only intended for use in conjunction with the
IdentifyLanguage
parameter. If you're not includingIdentifyLanguage
in your request and want to use a custom vocabulary with your transcription, use theVocabularyName
parameter instead.For more information, see Custom vocabularies.
Length Constraints: Minimum length of 1. Maximum length of 3000.
Pattern:
^[a-zA-Z0-9,-._]+
Request Body
The request accepts the following data in JSON format.
- AudioStream
-
An encoded stream of audio blobs. Audio streams are encoded as either HTTP/2 or WebSocket data frames.
For more information, see Transcribing streaming audio.
Type: AudioStream object
Required: Yes
Response Syntax
HTTP/2 200
x-amzn-request-id: RequestId
x-amzn-transcribe-language-code: LanguageCode
x-amzn-transcribe-sample-rate: MediaSampleRateHertz
x-amzn-transcribe-media-encoding: MediaEncoding
x-amzn-transcribe-vocabulary-name: VocabularyName
x-amzn-transcribe-session-id: SessionId
x-amzn-transcribe-vocabulary-filter-name: VocabularyFilterName
x-amzn-transcribe-vocabulary-filter-method: VocabularyFilterMethod
x-amzn-transcribe-show-speaker-label: ShowSpeakerLabel
x-amzn-transcribe-enable-channel-identification: EnableChannelIdentification
x-amzn-transcribe-number-of-channels: NumberOfChannels
x-amzn-transcribe-enable-partial-results-stabilization: EnablePartialResultsStabilization
x-amzn-transcribe-partial-results-stability: PartialResultsStability
x-amzn-transcribe-content-identification-type: ContentIdentificationType
x-amzn-transcribe-content-redaction-type: ContentRedactionType
x-amzn-transcribe-pii-entity-types: PiiEntityTypes
x-amzn-transcribe-language-model-name: LanguageModelName
x-amzn-transcribe-identify-language: IdentifyLanguage
x-amzn-transcribe-language-options: LanguageOptions
x-amzn-transcribe-preferred-language: PreferredLanguage
x-amzn-transcribe-vocabulary-names: VocabularyNames
x-amzn-transcribe-vocabulary-filter-names: VocabularyFilterNames
Content-type: application/json
{
"TranscriptResultStream": {
"BadRequestException": {
},
"ConflictException": {
},
"InternalFailureException": {
},
"LimitExceededException": {
},
"ServiceUnavailableException": {
},
"TranscriptEvent": {
"Transcript": {
"Results": [
{
"Alternatives": [
{
"Entities": [
{
"Category": "string",
"Confidence": number,
"Content": "string",
"EndTime": number,
"StartTime": number,
"Type": "string"
}
],
"Items": [
{
"Confidence": number,
"Content": "string",
"EndTime": number,
"Speaker": "string",
"Stable": boolean,
"StartTime": number,
"Type": "string",
"VocabularyFilterMatch": boolean
}
],
"Transcript": "string"
}
],
"ChannelId": "string",
"EndTime": number,
"IsPartial": boolean,
"LanguageCode": "string",
"LanguageIdentification": [
{
"LanguageCode": "string",
"Score": number
}
],
"ResultId": "string",
"StartTime": number
}
]
}
}
}
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
The response returns the following HTTP headers.
- ContentIdentificationType
-
Shows whether content identification was enabled for your transcription.
Valid Values:
PII
- ContentRedactionType
-
Shows whether content redaction was enabled for your transcription.
Valid Values:
PII
- EnableChannelIdentification
-
Shows whether channel identification was enabled for your transcription.
- EnablePartialResultsStabilization
-
Shows whether partial results stabilization was enabled for your transcription.
- IdentifyLanguage
-
Shows whether automatic language identification was enabled for your transcription.
- LanguageCode
-
Provides the language code you specified in your request.
Valid Values:
en-US | en-GB | es-US | fr-CA | fr-FR | en-AU | it-IT | de-DE | pt-BR | ja-JP | ko-KR | zh-CN
- LanguageModelName
-
Provides the name of the custom language model you specified in your request.
Length Constraints: Minimum length of 1. Maximum length of 200.
Pattern:
^[0-9a-zA-Z._-]+
- LanguageOptions
-
Provides the language codes you specified in your request.
Length Constraints: Minimum length of 1. Maximum length of 200.
Pattern:
^[a-zA-Z-,]+
- MediaEncoding
-
Provides the media encoding you specified in your request.
Valid Values:
pcm | ogg-opus | flac
- MediaSampleRateHertz
-
Provides the sample rate you specified in your request.
Valid Range: Minimum value of 8000. Maximum value of 48000.
- NumberOfChannels
-
Provides the number of channels you specified in your request.
Valid Range: Minimum value of 2.
- PartialResultsStability
-
Provides the stabilization level used for your transcription.
Valid Values:
high | medium | low
- PiiEntityTypes
-
Lists the PII entity types you specified in your request.
Length Constraints: Minimum length of 1. Maximum length of 300.
Pattern:
^[A-Z_, ]+
- PreferredLanguage
-
Provides the preferred language you specified in your request.
Valid Values:
en-US | en-GB | es-US | fr-CA | fr-FR | en-AU | it-IT | de-DE | pt-BR | ja-JP | ko-KR | zh-CN
- RequestId
-
Provides the identifier for your streaming request.
- SessionId
-
Provides the identifier for your transcription session.
Length Constraints: Fixed length of 36.
Pattern:
[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}
- ShowSpeakerLabel
-
Shows whether speaker identification was enabled for your transcription.
- VocabularyFilterMethod
-
Provides the vocabulary filtering method used in your transcription.
Valid Values:
remove | mask | tag
- VocabularyFilterName
-
Provides the name of the custom vocabulary filter you specified in your request.
Length Constraints: Minimum length of 1. Maximum length of 200.
Pattern:
^[0-9a-zA-Z._-]+
- VocabularyFilterNames
-
Provides the names of the custom vocabulary filters you specified in your request.
Length Constraints: Minimum length of 1. Maximum length of 3000.
Pattern:
^[a-zA-Z0-9,-._]+
- VocabularyName
-
Provides the name of the custom vocabulary you specified in your request.
Length Constraints: Minimum length of 1. Maximum length of 200.
Pattern:
^[0-9a-zA-Z._-]+
- VocabularyNames
-
Provides the names of the custom vocabularies you specified in your request.
Length Constraints: Minimum length of 1. Maximum length of 3000.
Pattern:
^[a-zA-Z0-9,-._]+
The following data is returned in JSON format by the service.
- TranscriptResultStream
-
Provides detailed information about your streaming session.
Type: TranscriptResultStream object
Errors
For information about the errors that are common to all actions, see Common Errors.
- BadRequestException
-
One or more arguments to the
StartStreamTranscription
orStartMedicalStreamTranscription
operation was invalid. For example,MediaEncoding
orLanguageCode
used invalid values. Check the specified parameters and try your request again.HTTP Status Code: 400
- ConflictException
-
A new stream started with the same session ID. The current stream has been terminated.
HTTP Status Code: 409
- InternalFailureException
-
A problem occurred while processing the audio. Amazon Transcribe terminated processing.
HTTP Status Code: 500
- LimitExceededException
-
Your client has exceeded one of the Amazon Transcribe limits. This is typically the audio length limit. Break your audio stream into smaller chunks and try your request again.
HTTP Status Code: 429
- ServiceUnavailableException
-
The service is currently unavailable. Try your request later.
HTTP Status Code: 503
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: