Menu
Amazon Comprehend
Developer Guide

Guidelines and Limits

Keep in mind the following information when using Amazon Comprehend.

Supported Regions

For a list of AWS Regions where Amazon Comprehend is availabe, see AWS Regions and Endpoints in the Amazon Web Services General Reference.

Throttling

For information about throttling for Amazon Comprehend and to request a limit increase, see Amazon Comprehend Limits in the Amazon Web Services General Reference.

You may be able to avoid throttling by using the batch operations instead of the single transaction operations. For more information, see Batch Operations.

Overall Limits

All operations except topic modeling operations have the following limits:

Description Limit
Character encoding UTF-8
Document size (UTF-8 characters) 5,000 bytes

Amazon Comprehend may store your content to continuously improve the quality of its analysis models. See the Amazon Comprehend FAQ to learn more. To request that we delete content that may have been stored by Amazon Comprehend, open a case with AWS Support.

Batch Operations

The BatchDetectDominantLanguage, BatchDetectEntities, BatchDetectKeyPhrases, and BatchDetectSentiment operations have the following limits:

Description Limit
Documents per request 25

If you plan to send more than 20 requests per second, you should consider using the batch operations. Batch operations enable you to send more documents in each request which may result in higher throughput. For example, when you use the DetectDominantLanguage operation, you can send up to 20 documents per second. However, if you use the BatchRequestDominantLanguage operation, you can send up to 250 documents per second, but processing speed may be lower. For more information about throttling limits see Amazon Comprehend Limits in the Amazon Web Services General Reference. For more information about using the batch APIs, see Batch Processing Documents.

Language Detection

The BatchDetectDominantLanguage and DetectDominantLanguage operations have the following limitations:

  • They don't support phonetic language detection. For example, they will not detect "arigato" as Japanese nor "nihao" as Chinese.

  • They may have trouble distinguishing close language pairs, such as Indonesian and Malay; or Bosnian, Croation, and Serbian.

  • For best results the input text should be at least 20 characters long.

Topic Modeling

Topic detection jobs created with the StartTopicsDetectionJob operation have the following limits:

Description Limit
Character encoding UTF-8
Maximum number of topics to return 100
Total size of all files in request 5 Gb
Maximum file size for one file, one document per file 100 Mb
Maximum number of files, one document per file 1,000,000
Maximum number of lines, one document per line 1,000,000

For best results, you should include at least 1,000 input documents.