Setting up a WebSocket stream - Amazon Transcribe

Setting up a WebSocket stream

The key components for a WebSocket protocol for streaming transcription requests with Amazon Transcribe are:

  • The upgrade request. This contains the query parameters for your request, and a signature that Amazon Transcribe uses as a seed signature to sign the data frames.

  • One or more message frames in event stream encoding that contain metadata and raw audio bytes.

  • An end frame. This is a signed message in event stream encoding with an empty body.

Note

Amazon Transcribe only supports one stream per WebSocket session. If you attempt to use multiple streams, your transcription request fails.

  1. Attach the following policy to the IAM role that makes the request. See Adding IAM policies for more information.

    { "Version": "2012-10-17", "Statement": [ { "Sid": "transcribestreaming", "Effect": "Allow", "Action": "transcribe:StartStreamTranscriptionWebSocket", "Resource": "*" } ] }
  2. To start the session, create a pre-signed URL in the following format. Line breaks have been added for readability.

    GET wss://transcribestreaming.us-west-2.amazonaws.com:8443/stream-transcription-websocket? &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=access-key%2FYYYYMMDD%2Fus-west-2%2Ftranscribe%2Faws4_request &X-Amz-Date=YYYYMMDDTHHMMSSZ &X-Amz-Expires=250 &X-Amz-Security-Token=security-token &X-Amz-Signature=string &X-Amz-SignedHeaders=content-type%3Bhost%3Bx-amz-date &language-code=en-US &media-encoding=flac &sample-rate=16000

    Additional operations and parameters are listed in the API Reference; parameters common to all AWS API operations are listed in the Common Parameters section.

    To construct the URL for your request and create the Signature Version 4 signature, refer to the following steps. Examples are in pseudocode.

    1. Create a canonical request. A canonical request is a string that includes information from your request in a standardized format. This ensures that when AWS receives the request, it can calculate the same signature you created for your URL. For more information, see Create a Canonical Request for Signature Version 4.

      # HTTP verb method = "GET" # Service name service = "transcribe" # Region region = "us-west-2" # Amazon Transcribe streaming endpoint endpoint = "wss://transcribestreaming.us-west-2.amazonaws.com:8443" # Host host = "transcribestreaming.us-west-2.amazonaws.com:8443" # Date and time of request amz-date = YYYYMMDDTHHMMSSZ # Date without time for credential scope datestamp = YYYYMMDD
    2. Create a canonical URI, which is the part of the URI between the domain and the query string.

      canonical_uri = "/stream-transcription-websocket"
    3. Create the canonical headers and signed headers. Note the trailing \n in the canonical headers.

      • Append the lowercase header name followed by a colon ( : ).

      • Append a comma-separated list of values for that header. Do not sort values in headers that have multiple values.

      • Append a new line (\n).

      canonical_headers = "host:" + host + "\n" signed_headers = "host"
    4. Match the algorithm to the hashing algorithm. You must use SHA-256.

      algorithm = "AWS4-HMAC-SHA256"
    5. Create the credential scope, which scopes the derived key to the date, AWS Region, and service. For example, 20220127/us-west-2/transcribe/aws4_request.

      credential_scope = datestamp + "/" + region + "/" + service + "/" + "aws4_request"
    6. Create the canonical query string. Query string values must be URI-encoded and sorted by name.

      • Sort the parameter names by character code point in ascending order. Parameters with duplicate names should be sorted by value. For example, a parameter name that begins with the uppercase letter F precedes a parameter name that begins with the lowercase letter b.

      • Do not URI-encode any of the unreserved characters that RFC 3986 defines: A-Z, a-z, 0-9, hyphen ( - ), underscore ( _ ), period ( . ), and tilde ( ~ ).

      • Percent-encode all other characters with %XY, where X and Y are hexadecimal characters (0-9 and uppercase A-F). For example, the space character must be encoded as %20 (don't include '+', as some encoding schemes do); extended UTF-8 characters must be in the form %XY%ZA%BC.

      • Double-encode any equals ( = ) characters in parameter values.

      canonical_querystring = "X-Amz-Algorithm=" + algorithm canonical_querystring += "&X-Amz-Credential="+ URI-encode(access key + "/" + credential_scope) canonical_querystring += "&X-Amz-Date=" + amz_date canonical_querystring += "&X-Amz-Expires=250" canonical_querystring += "&X-Amz-Security-Token=" + token canonical_querystring += "&X-Amz-SignedHeaders=" + signed_headers canonical_querystring += "&language-code=en-US&media-encoding=flac&sample-rate=16000"
    7. Create a hash of the payload. For a GET request, the payload is an empty string.

      payload_hash = HashSHA256(("").Encode("utf-8")).HexDigest()
    8. Combine the following elements to create the canonical request.

      canonical_request = method + '\n' + canonical_uri + '\n' + canonical_querystring + '\n' + canonical_headers + '\n' + signed_headers + '\n' + payload_hash
  3. Create the string to sign, which contains meta information about your request. You use the string to sign in the next step when you calculate the request signature. For more information, see Create a String to Sign for Signature Version 4.

    string_to_sign=algorithm + "\n" + amz_date + "\n" + credential_scope + "\n" + HashSHA256(canonical_request.Encode("utf-8")).HexDigest()
  4. Calculate the signature. To do this, derive a signing key from your AWS secret access key. For a greater degree of protection, the derived key is specific to the date, service, and AWS Region. Use this derived key to sign the request. For more information, see Calculate the Signature for AWS Signature Version 4.

    Make sure you implement the GetSignatureKey function to derive your signing key. If you have not yet derived a signing key, refer to Examples of how to derive a signing key for Signature Version 4.

    #Create the signing key signing_key = GetSignatureKey(secret_key, datestamp, region, service) # Sign the string_to_sign using the signing key signature = HMAC.new(signing_key, (string_to_sign).Encode("utf-8"), Sha256()).HexDigest

    The function HMAC(key, data) represents an HMAC-SHA256 function that returns results in binary format.

  5. Add signing information to the request and create the request URL.

    After you calculate the signature, add it to the query string. For more information, see Add the Signature to the Request.

    First, add the authentication information to the query string.

    canonical_querystring += "&X-Amz-Signature=" + signature

    Second, create the URL for the request.

    request_url = endpoint + canonical_uri + "?" + canonical_querystring

    Use the request URL with your WebSocket library to make the request to Amazon Transcribe.

  6. The request to Amazon Transcribe must include the following headers. Typically these headers are managed by your WebSocket client library.

    Host: transcribestreaming.us-west-2.amazonaws.com:8443 Connection: Upgrade Upgrade: websocket Origin: URI-of-WebSocket-client Sec-WebSocket-Version: 13 Sec-WebSocket-Key: randomly-generated-string
  7. When Amazon Transcribe receives your WebSocket request, it responds with a WebSocket upgrade response. Typically your WebSocket library manages this response and sets up a socket for communications with Amazon Transcribe.

    The following is the response from Amazon Transcribe. Line breaks have been added for readability.

    HTTP/1.1 101 WebSocket Protocol Handshake Connection: upgrade Upgrade: websocket websocket-origin: wss://transcribestreaming.us-west-2.amazonaws.com:8443 websocket-location: transcribestreaming.us-west-2.amazonaws.com:8443/stream-transcription-websocket? &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE%2F20220208%2Fus-west-2%2Ftranscribe%2Faws4_request &X-Amz-Date=20220208T235959Z &X-Amz-Expires=250 &X-Amz-Signature=Signature Version 4 signature &X-Amz-SignedHeaders=host &language-code=en-US &session-id=String &media-encoding=flac &sample-rate=16000 x-amzn-RequestId: RequestId Strict-Transport-Security: max-age=31536000 sec-websocket-accept: hash-of-the-Sec-WebSocket-Key-header
  8. Make your WebSocket streaming request.

    After the WebSocket connection is established, the client can start sending a sequence of audio frames, each encoded using event stream encoding.

    Each data frame contains three headers combined with a chunk of raw audio bytes; the following table describes these headers.

    Header name byte length Header name (string) Header value type Value string byte length Value string (UTF-8)
    13 :content-type 7 24 application/octet-stream
    11 :event-type 7 10 AudioEvent
    13 :message-type 7 5 event
  9. To end the data stream, send an empty audio chunk in an event stream encoded message.

    The response contains event stream encoded raw bytes in the payload. It contains the standard prelude and the following headers.

    Header name byte length Header name (string) Header value type Value string byte length Value string (UTF-8)
    13 :content-type 7 16 application/json
    11 :event-type 7 15 TranscriptEvent
    13 :message-type 7 5 event

    When you decode the binary response, you end up with a JSON structure containing the transcription results.

Handling WebSocket streaming errors

If an exception occurs while processing your request, Amazon Transcribe responds with a terminal WebSocket frame containing an event stream encoded response. This response contains the headers described in the following table; the body of the response contains a descriptive error message. After sending the exception response, Amazon Transcribe sends a close frame.

Header name byte length Header name (string) Header value type Value string byte length Value string (UTF-8)
13 :content-type 7 16 application/json
15 :exception-type 7 varies varies, see below
13 :message-type 7 9 exception

The exception-type header contains one of the following values:

  • BadRequestException: There was a client error when the stream was created, or an error occurred while streaming data. Make sure your client is ready to accept data and try your request again.

  • InternalFailureException: Amazon Transcribe had a problem during the handshake with the client. Try your request again.

  • LimitExceededException: The client exceeded the concurrent stream limit. For more information, see Amazon Transcribe Limits. Reduce the number of streams you're transcribing.

  • UnrecognizedClientException: The WebSocket upgrade request was signed with an incorrect access key or secret key. Make sure you're correctly creating the access key and try your request again.

Amazon Transcribe can also return any of the common service errors. For a list, see Common Errors.