Guidelines and Limits
Keep in mind the following information when using Amazon Comprehend.
Supported Regions
For a list of AWS Regions where Amazon Comprehend is available, see AWS Regions and Endpoints in the Amazon Web Services General Reference.
Throttling
For information about throttling for Amazon Comprehend and to request a limit increase, see Amazon Comprehend Limits in the Amazon Web Services General Reference.
You may be able to avoid throttling by using the batch operations instead of the single transaction operations. For more information, see Multiple Document Operations.
Overall Limits
All operations except asynchronous operations and topic modeling operations have the following limits:
Description | Limit |
---|---|
Character encoding | UTF-8 |
Document size (UTF-8 characters) | 5,000 bytes |
Amazon Comprehend may store your content to continuously improve the quality of its analysis models. See the Amazon Comprehend FAQ to learn more. To request that we delete content that may have been stored by Amazon Comprehend, open a case with AWS Support.
Multiple Document Operations
The BatchDetectDominantLanguage, BatchDetectEntities, BatchDetectKeyPhrases, and BatchDetectSentiment operations have the following limits:
Description | Limit |
---|---|
Documents per request | 25 |
If you plan to send more than 20 requests per second, you should consider using the
batch operations. Batch operations enable you to send more documents in each request
which may result in higher throughput. For example, when you use the
DetectDominantLanguage
operation, you can send up to 20 documents per
second. However, if you use the BatchRequestDominantLanguage
operation, you
can send up to 250 documents per second, but processing speed may be lower. For more
information about throttling limits see Amazon Comprehend Limits in the Amazon Web Services General
Reference. For more information about using the multiple document APIs,
see Multiple Document Synchronous Processing.
Asynchronous Operations
Asynchronous batches started with the StartDominantLanguageDetectionJob, StartEntitiesDetectionJob, StartKeyPhrasesDetectionJob, and StartSentimentDetectionJob have the following limits:
Description | Limit |
---|---|
Maximum size (UTF-8 characters) for one document, entity and key phrase detection | 100 KB |
Maximum size (UTF-8 characters) for one document, language detection | 1 MB |
Maximum size (UTF-8 characters) for one document, sentiment detection | 5 KB |
Total size of all files in batch | 5 GB |
Maximum number of files, one document per file | 1,000,000 |
Maximum number of lines, one document per line | 1,000,000 |
You should use the asynchronous operations:
-
To analyze more than 25 documents at a time
-
To analyze documents larger than 5,000 bytes for keywords and entities
For more information, see Asynchronous Batch Processing.
Document Classification
Document classifier training jobs started with the CreateDocumentClassifier operation and document classification jobs started with the StartDocumentClassificationJob operation have the following limits:
Description | Limit |
---|---|
Character encoding | UTF-8 |
Maximum number of labels | 1,000 |
Maximum length of labels | 5,000 characters |
Total size of all files in request | 5 GB |
Maximum file size for one file, one document per file | 100 MB |
Maximum number of files, one document per file | 1,000,000 |
Maximum number of lines, one document per line | 1,000,000 |
Language Detection
The BatchDetectDominantLanguage, DetectDominantLanguage operations and asynchronous jobs started with the StartDominantLanguageDetectionJob operation have the following limitations:
-
They don't support phonetic language detection. For example, they will not detect "arigato" as Japanese nor "nihao" as Chinese.
-
They may have trouble distinguishing close language pairs, such as Indonesian and Malay; or Bosnian, Croatian, and Serbian.
-
For best results the input text should be at least 20 characters long.
Topic Modeling
Topic detection jobs created with the StartTopicsDetectionJob operation have the following limits:
Description | Limit |
---|---|
Character encoding | UTF-8 |
Maximum number of topics to return | 100 |
Total size of all files in request | 5 GB |
Maximum file size for one file, one document per file | 100 MB |
Maximum number of files, one document per file | 1,000,000 |
Maximum number of lines, one document per line | 1,000,000 |
For best results, you should include at least 1,000 input documents.
Custom Entity Recognition
Custom entity recognition jobs started with the CreateEntityRecognizer operation have the following limits:
General
Description | Minimum | Maximum |
---|---|---|
Number of entities per model/custom entity recognizer | -- | 1 |
Document size (UTF-8) | 1 byte | 5,000 |
Number of documents | 2,000 | 120,000 |
Document corpus size (all docs in plain text combined) | 5 KB | 100 MB |
Annotations
Description | Minimum | Maximum |
---|---|---|
Number of annotations | 1,000 | n/a |
Entity Lists
Description | Minimum | Maximum |
---|---|---|
Number of items in entity list | 1 | 1,000,000 |
Length of individual entry (post-strip) in entry list | 1 | 5,000 |
Entity list corpus size (all docs in plain text combined) | 5 KB | 100 MB |