Redacting or identifying PII in a real-time stream
When redacting personally identifiable information (PII) from a streaming transcription,
Amazon Transcribe replaces each identified instance of PII with [PII]
in your
transcript.
An additional option available for streaming transcriptions is PII
identification. When you activate PII Identification, Amazon Transcribe labels the PII
in your transcription results under an Entities
object. For an output sample, see
Example redacted streaming output
and Example PII identification output.
Redaction and identification of PII with streaming transcriptions is available with these
English dialects: Australian (en-AU
), British (en-GB
), US
(en-US
) and Spanish US dialect (es-US
).
PII identification and redaction for streaming jobs is performed only upon complete transcription of the audio segments.
Types of PII Amazon Transcribe can recognize for streaming
transcriptions | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PII type | Description | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ADDRESS |
A physical address, such as 100 Main Street, Anytown, USA or Suite #12, Building 123. An address can include a street, building, location, city, state, country, county, zip, precinct, neighborhood, and more. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ALL |
Redact or identify all PII types listed in this table. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
BANK_ACCOUNT_NUMBER |
A US bank account number. These are typically between 10 - 12 digits long, but Amazon Transcribe also recognizes bank account numbers when only the last 4 digits are present. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
BANK_ROUTING |
A US bank account routing number. These are typically 9 digits long, but Amazon Transcribe also recognizes routing numbers when only the last 4 digits are present. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CREDIT_DEBIT_CVV |
A 3-digit card verification code (CVV) that is present on VISA, MasterCard, and Discover credit and debit cards. In American Express credit or debit cards, it is a 4-digit numeric code. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CREDIT_DEBIT_EXPIRY |
The expiration date for a credit or debit card. This number is usually 4 digits long and formatted as month/year or MM/YY. For example, Amazon Transcribe can recognize expiration dates such as 01/21, 01/2021, and Jan 2021. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CREDIT_DEBIT_NUMBER |
The number for a credit or debit card. These numbers can vary from 13 to 16 digits in length, but Amazon Transcribe also recognizes credit or debit card numbers when only the last 4 digits are present. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
EMAIL |
An email address, such as efua.owusu@email.com. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
NAME |
An individual's name. This entity type does not include titles, such as Mr., Mrs., Miss, or Dr. Amazon Transcribe does not apply this entity type to names that are part of organizations or addresses. For example, Amazon Transcribe recognizes the John Doe Organization as an organization, and Jane Doe Street as an address. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PHONE |
A phone number. This entity type also includes fax and pager numbers. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PIN |
A 4-digit personal identification number (PIN) that allows someone to access their bank account information. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SSN |
A Social Security Number (SSN) is a 9-digit number that is issued to US citizens, permanent residents, and temporary working residents. Amazon Transcribe also recognizes Social Security Numbers when only the last 4 digits are present. |
You can start a streaming transcription using the AWS Management Console, WebSocket, or HTTP/2.
-
Sign into the AWS Management Console
. -
In the navigation pane, choose Real-time transcription. Scroll down to Content removal settings and expand this field if it is minimized.
-
Toggle on PII Identification & redaction.
-
Select Identification only or Identification & redaction, then select the PII entity types you want to identify or redact in your transcript.
-
You're now ready to transcribe your stream. Select Start streaming and begin speaking. To end your dictation, select Stop streaming.
This example creates a presigned URL that uses PII redaction (or PII identification)
in a WebSocket stream. Line breaks have been added for readability. For more information
on using WebSocket streams with Amazon Transcribe, see
Setting up a WebSocket stream.
For more detail on parameters, see
StartStreamTranscription
.
GET wss://transcribestreaming.
us-west-2
.amazonaws.com:8443/stream-transcription-websocket? &X-Amz-Algorithm=AWS4-HMAC-SHA256 &X-Amz-Credential=AKIAIOSFODNN7EXAMPLE
%2F20220208
%2Fus-west-2
%2Ftranscribe
%2Faws4_request &X-Amz-Date=20220208
T235959
Z &X-Amz-Expires=300
&X-Amz-Security-Token=security-token
&X-Amz-Signature=string
&X-Amz-SignedHeaders=content-type%3Bhost%3Bx-amz-date &language-code=en-US
&media-encoding=flac
&sample-rate=16000
&pii-entity-types=NAME
,ADDRESS
&content-redaction-type=PII (or &content-identification-type=PII)
You cannot use both content-identification-type
and
content-redaction-type
in the same request.
Parameter definitions can be found in the API Reference; parameters common to all AWS API operations are listed in the Common Parameters section.
This example creates an HTTP/2 request with PII identification or PII redaction enabled.
For more information on using HTTP/2 streaming with Amazon Transcribe, see
Setting up an HTTP/2 stream. For
more detail on parameters and headers specific to Amazon Transcribe, see
StartStreamTranscription
.
POST /stream-transcription HTTP/2 host: transcribestreaming.
us-west-2
.amazonaws.com X-Amz-Target: com.amazonaws.transcribe.Transcribe.StartStreamTranscription
Content-Type: application/vnd.amazon.eventstream X-Amz-Content-Sha256:string
X-Amz-Date:20220208
T235959
Z Authorization: AWS4-HMAC-SHA256 Credential=access-key
/20220208
/us-west-2
/transcribe/aws4_request, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date;x-amz-target;x-amz-security-token, Signature=string
x-amzn-transcribe-language-code:en-US
x-amzn-transcribe-media-encoding:flac
x-amzn-transcribe-sample-rate:16000
x-amzn-transcribe-content-identification-type: PII (or x-amzn-transcribe-content-redaction-type: PII) x-amzn-transcribe-pii-entity-types:transfer-encoding: chunked
NAME
,ADDRESS
You cannot use both content-identification-type
and
content-redaction-type
in the same request.
Parameter definitions can be found in the API Reference; parameters common to all AWS API operations are listed in the Common Parameters section.
Note
PII redaction for streaming is only supported in these AWS Regions: Asia Pacific (Seoul), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), EU (Frankfurt), EU (Ireland), EU (London), US East (N. Virginia), US East (Ohio), and US West (Oregon).