Custom vocabularies - Amazon Transcribe

Custom vocabularies

Use custom vocabularies to improve transcription accuracy for one or more specific words. These are generally domain-specific words and phrases, words that Amazon Transcribe isn't recognizing, or proper nouns. We recommend creating separate, small vocabularies tailored to specific audio recordings instead of creating a single, large vocabulary for use with all of your recordings. If you require an extensive list of corrections, consider creating a custom language model instead.

Important

You are responsible for the integrity of your own data when you use Amazon Transcribe. Do not enter confidential information, personal information (PII), or protected health information (PHI) into a custom vocabulary.

Considerations when creating a custom vocabulary:

  • You can have up to 100 vocabularies in your account.

  • The size limit for each custom vocabulary is 50 Kb.

  • Each entry must contain fewer than 256 characters, including hyphens.

  • Only use characters from the allowed character set for your language (see Character sets for custom vocabularies).

  • Each vocabulary file can be in either table or list format; table format is strongly recommended.

  • Your vocabulary files must be stored in an S3 bucket if using a table format. If using a list, you can upload a text file using the console or include your list of words within an API call.

Should I use a table or a list for my custom vocabulary?

The table format gives you more options for—and more control over—the input and output of words within your custom vocabulary. With tables, you can specify multiple categories (Phrase, IPA, SoundsLike, and DisplayAs), allowing you to fine-tune your output. Lists don't have additional options, so you can only type in entries as you want them to appear in your transcript.

The console, CLI, and SDKs all use custom vocabulary tables in the same way; lists are used differently for each method and thus may require additional formatting for successful use between the console, CLI, and SDKs.

For more information, see Creating a custom vocabulary using a table and Creating a custom vocabulary using a list

API operations specific to custom vocabularies