Transcribing an audio file using a medical custom vocabulary
Use the StartMedicalTranscriptionJob
or the AWS Management Console to start a
transcription job that uses a custom vocabulary to improve transcription
accuracy.
-
Sign in to the AWS Management Console
. -
In the navigation pane, under Amazon Transcribe Medical, choose Transcription jobs.
-
Choose Create job.
-
On the Specify job details page, provide information about your transcription job.
-
Choose Next.
-
Under Customization, enable Custom vocabulary.
-
Under Vocabulary selection, choose a custom vocabulary.
-
Choose Create.
To enable speaker partitioning in an audio file using a batch transcription job (API)
-
For the
StartMedicalTranscriptionJob
API, specify the following.-
For
MedicalTranscriptionJobName
, specify a name that is unique in your AWS account. -
For
LanguageCode
, specify the language code that corresponds to the language spoken in your audio file and the language of your vocabulary filter. -
For the
MediaFileUri
parameter of theMedia
object, specify the name of the audio file that you want to transcribe. -
For
Specialty
, specify the medical specialty of the clinician speaking in the audio file. -
For
Type
, specify whether the audio file is a conversation or a dictation. -
For
OutputBucketName
, specify the Amazon S3 bucket to store the transcription results. -
For the
Settings
object, specify the following.-
VocabularyName
– the name of your custom vocabulary.
-
-
The following request uses the AWS SDK for Python (Boto3) to start a batch transcription job with a custom vocabulary.
from __future__ import print_function import time import boto3 transcribe = boto3.client('transcribe', '
us-west-2
') job_name = "my-first-med-transcription-job
" job_uri = "s3://DOC-EXAMPLE-BUCKET
/my-input-files
/my-media-file
.flac
" transcribe.start_medical_transcription_job( MedicalTranscriptionJobName = job_name, Media = { 'MediaFileUri': job_uri }, OutputBucketName = 'DOC-EXAMPLE-BUCKET
', OutputKey = 'my-output-files
/', LanguageCode = 'en-US', Specialty = 'PRIMARYCARE', Type = 'CONVERSATION', Settings = { 'VocabularyName': 'example-med-custom-vocab' } ) while True: status = transcribe.get_medical_transcription_job(MedicalTranscriptionJobName = job_name) if status['MedicalTranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']: break print("Not ready yet...") time.sleep(5) print(status)