Guidelines and quotas - Amazon Comprehend

Guidelines and quotas

Unless otherwise specified, the Amazon Comprehend quotas are per region. You can request an increase to adjustable quotas if needed for your applications. For information about quotas and to request a quota increase, see AWS Service Quotas.

Supported Regions

Amazon Comprehend is available in the following AWS Regions:

  • US East (Ohio)

  • US East (N. Virginia)

  • US West (Oregon)

  • Asia Pacific (Mumbai)

  • Asia Pacific (Seoul)

  • Asia Pacific (Singapore)

  • Asia Pacific (Sydney)

  • Asia Pacific (Tokyo)

  • Canada (Central)

  • Europe (Frankfurt)

  • Europe (Ireland)

  • Europe (London)

  • AWS GovCloud (US-West)

By default, Amazon Comprehend provides all API operations in each of the supported regions. For exceptions, see Document processing.

For information about API endpoints, see Amazon Comprehend Regions and Endpoints in the Amazon Web Services General Reference.

To review current quotas in a region, or to request quota increases for adjustable quotas, open the Service Quotas console.

Quotas for built-in models

Amazon Comprehend provides built-in models for you to analyze UTF-8 text documents. Amazon Comprehend provides synchronous and asynchronous operations that use the built-in models.

Real-time (synchronous) analysis

This section describes quotas related to real-time analysis using the built-in models.

Single document operations

The Amazon Comprehend API provides operations that take a single document as input. The following quotas apply to these operations.

General quotas for single document operations

The following quotas apply to real-time analysis for detecting entities, key-phrases, or dominant language. For entity detection, these quotas apply to detection with the built-in models. For custom entity detection, see the quotas in Custom entity recognition .

Description Quota/Guideline
Maximum document size 100 KB
Operation-specific quotas for single document operations

The following quotas apply to real-time analysis for detecting sentiment, targeted sentiment, and syntax.

Description Quota/Guideline
Maximum document size 5 KB

Multiple document operations

The Amazon Comprehend API provides batch operations that process multiple documents with a single API request. The following quotas apply to the batch operations.

Description Quota/Guideline
Maximum document size 5 KB
Maximum documents per request 25

For more information about using the batch document operations, see Multiple document synchronous processing.

Request throttling for real-time (synchronous) requests

Amazon Comprehend applies dynamic throttling to synchronous requests. If system processing bandwidth is available, Amazon Comprehend gradually increases the number of your requests that it processes. To control your application's usage of the synchronous API operations, we recommend that you turn on billing alerts or implement rate-limiting in your application.

Asynchronous analysis

This section describes quotas related to asynchronous analysis using the built-in models.

Asynchronous API operations each support a maximum of 10 active jobs. To view the quotas for each API operation, see the Service Quotas table in Amazon Comprehend endpoints and quotas in the Amazon Web Services General Reference.

For adjustable quotas, you can request a quota increase using the Service Quotas console.

General quotas for asynchronous operations

You can run asynchronous analysis jobs using the console or any of the API Start* operations. For information about when to use asynchronous operations, see Asynchronous batch processing. The following quotas apply to most of the API Start* operations for built-in models. For the exceptions, see Operation-specific quotas for asynchronous jobs.

Description Quota/Guideline
Maximum size of each document in jobs that detect entities, key phrases, PII, and languages 1 MB
Maximum total size of all files in a request 5 GB
Minimum total size of all files in a request 500 bytes
Maximum number of files, one document per file 1,000,000
Maximum total number of lines, one document per line 1,000,000

Operation-specific quotas for asynchronous jobs

This section describes quotas for specific asynchronous operations. If a quota isn't specified in the following tables, the general quota value applies.

Sentiment

Asynchronous sentiment jobs, which you create with the StartSentimentDetectionJob operation, have the following quotas.

Description Quota/Guideline
Maximum size of each input document 5 KB
Targeted sentiment

Asynchronous targeted sentiment jobs, which you create with the StartTargetedSentimentDetectionJob operation, have the following quotas.

Description Quota/Guideline
Supported document formats UTF-8
Maximum size of each document in a job 10 KB
Maximum size of all documents in a job 300 MB
Maximum number of files, one document per file 30,000
Maximum total number of lines, one document per line (for all files in a request) 30,000
Events

Asynchronous events detection jobs, which you create with the StartEventsDetectionJob operation, have the following quotas.

Description Quotas
Character encoding UTF-8
Total size of all files in a job 50 MB
Maximum size of each document in a job 10 KB
Maximum number of files, one document per file 5,000
Maximum total number of lines, one document per line (for all files in request) 5,000
Topic modeling

Asynchronous topic modeling jobs, which you create with the StartTopicsDetectionJob operation, have the following quotas.

Description Quota/Guideline
Character encoding UTF-8
Maximum number of topics to return 100
Maximum file size for one file, one document per file 100 MB

For more information, see Topic modeling

Request throttling for asynchronous requests

Each asynchronous API operation supports a maximum number of requests per second (per region, per account), and also a maximum of 10 active jobs. To view the quotas for each API operation, see the Service Quotas table in Amazon Comprehend endpoints and quotas in the Amazon Web Services General Reference.

For adjustable quotas, you can request a quota increase using the Service Quotas console.

Quotas for custom models

You can use Amazon Comprehend to build your own custom models for custom classification and custom entity recognition. This section provides the guidelines and quotas related to training and using custom models. For more information about custom models, see Amazon Comprehend Custom.

General quotas

Amazon Comprehend sets general size quotas for each type of input document that you can analyze with custom models. For real-time analysis quotas, see Maximum document sizes for real-time analysis. For asynchronous analysis quotas, see Inputs for asynchronous custom analysis.

Each asynchronous API operation supports a maximum number of requests per second (per region, per account), and also a maximum of 10 active jobs. To view the quotas for each API operation, see the Service Quotas table in Amazon Comprehend endpoints and quotas in the Amazon Web Services General Reference.

For adjustable quotas, you can request a quota increase using the Service Quotas console.

Quotas for endpoints

You create an endpoint to run real-time analysis with a custom model. For information about endpoints, see Managing Amazon Comprehend endpoints.

The following quotas apply to the endpoints. For information about how to request a quota increase, see AWS Service Quotas.

Description Quota/Guideline
Maximum number of active endpoints per Region for each account 20
Maximum number of inference units per Region for each account 200
Maximum number of inference units per endpoint per region 50
Maximum throughput per inference unit (characters) 100/second
Maximum throughput per inference unit (documents) 2/second

Document classification

This section describes the guidelines and quotas for the following document classification operations:

General quotas for document classification

The following table describes general quotas related to training custom classifiers.

Description Quota/Guideline
Maximum length of class name 5,000 characters
Number of classes (multi-class mode) 2–1,000
Number of classes (multi-label mode) 2–100
Annotations format
Minimum number of annotations per class (multi-class mode) 10
Minimum number of annotations per class (multi-label mode) 10
Minimum number of annotations (multi-label mode) 50
CSV file format
Minimum number of training documents per class (multi-class mode) 50
Minimum number of training documents per class (multi-label mode) 10
Minimum number of training documents (multi-label mode) 50

Classification for plain text documents

You create and train a plain-text model using plain-text input documents. Amazon Comprehend provides real-time and asynchronous operations to classify plain text documents using a plain-text model.

Training

The following table describes quotas related to training a custom classifier with plain text documents.

Description Quota/Guideline
Total size of all files in training job 5 GB
Maximum number of augmented manifest files for training a custom classifier 5
Maximum number of attribute names for each augmented manifest file 5
Maximum length of attribute name 63 characters
Real-time (synchronous) analysis

The following table describes quotas related to real-time classification of plain text documents.

Description Quota/Guideline
Maximum number of documents per synchronous request 1
Maximum text document size (UTF-8 encoded) 10 KB
Asynchronous analysis

The following table describes quotas related to asynchronous classification of plain text documents.

Description Quota/Guideline
Total size of all files in asynchronous job 5 GB
Maximum file size for one file, one document per file 10 MB
Maximum number of files, one document per file 1,000,000
Maximum total number of lines, one document per line (for all files in request) 1,000,000

Classification for semi-structured documents

This section describes the guidelines and quotas for document classification of semi-structured documents. To classify semi-structured documents, use a native document model that you trained with native input documents.

Training a native document model with semi-structured docs

The following table describes quotas related to training a custom classifier with semi-structured documents, such as PDF documents, Word documents, and image files.

Description Quota/Guideline
Maximum number of pages across all documents 10,000
Maximum annotations file size (all CSV file sizes combined) 5 MB
Document corpus size (training and test documents) 10 GB
File sizes for training and testing files
Image file size (JPG, PNG, TIFF). 1 byte–10 MB.

TIFF files: one page maximum.

Page size for PDF documents 1 byte–10 MB
Page size for Word documents 1 byte–10 MB
Amazon Textract API output JSON size 1 byte–1 MB
Real-time (synchronous) analysis

This section describes quotas related to real-time classification of semi-structured documents.

The following table shows the maximum file sizes for input documents. For all input document types, the input file maximum is one page, with no more than 10,000 characters.

File type Maximum size (API) Maximum size (console)
UTF-8 text documents 10 KB 10 KB
PDF documents 10 MB 5 MB
Word documents 10 MB 5 MB
Image files 10 MB 5 MB
Amazon Textract API output size 1 MB n/a
Asynchronous analysis

The following table describes quotas related to asynchronous classification of semi-structured documents.

Description Quota/Guideline
Maximum number of pages across all input documents for a job 25,000
Document corpus size 25 GB
Image file size (JPG, PNG, or TIFF) 1 byte–10 MB.

TIFF files: one page maximum.

Page size for PDF documents 1 byte–10 MB
Page size for Word documents 1 byte–10 MB
Textract API output JSON size 1 byte–1 MB.

Custom entity recognition

This section describes the guidelines and quotas for the following operations for custom entity recognition:

Custom entity recognition for plain text documents

Amazon Comprehend provides async and sync operations to analyze plain text documents with a custom entity recognizer.

Training

This section describes quotas related to training a custom entity recognizer to analyze plain text documents. To train the model, you can provide an entity list or a set of annotated text documents.

The following table describes quotas related to training the model with an entity list.

Description Quota/Guideline
Number of entities per model 1–25
Document size (UTF-8) 1–5,000 byte
Number of items in entity list 1–1 million
Length of individual entry (post-strip) in entry list 1–5,000
Entity list corpus size (all docs in plaintext combined) 5 KB –200 MB

The following table describes quotas related to training the model with annotated text documents.

Description Quota/Guideline
Number of entities per model/custom entity recognizer 1–25
Document size (UTF-8) 1–5,000 byte
Number of documents (see Plain-text annotations) 3–200,000
Document corpus size (all docs in plaintext combined) 5 KB - 200 MB
Minimum number of annotations per entity 25
Real-time (synchronous) analysis

The following table describes quotas related to real-time analysis of plain text documents.

Description Quota/Guideline
Maximum number of documents per synchronous request 1
Maximum text document size (UTF-8 encoded) 5 KB
Asynchronous analysis

The following table describes quotas related to asynchronous entity recognition of plain text documents.

Description Quota/Guideline
Document size (UTF-8) 1 byte–1 MB
Maximum number of files, one document per file 1,000,000
Maximum total number of lines, one document per line (for all files in request) 1,000,000
Document corpus size (all docs in plaintext combined) 1 byte–5 GB

Custom entity recognition for semi-structured documents

Amazon Comprehend provides async and sync operations to analyze semi-structured documents with a custom entity recognizer. You must train the model using annotated PDF documents.

Training

The following table describes quotas related to training a custom entity recognizer (CreateEntityRecognizer) to analyze semi-structured documents.

Description Quota/Guideline
Number of entities per model/custom entity recognizer 1–25
Maximum annotation file size (UTF-8 JSON) 5 MB
Number of documents 250–10,000
Document corpus size (all docs in plaintext combined) 5 KB–1 GB
Minimum number of annotations per entity 100
Maximum number of augmented manifest files for training a custom entity recognizer 5
Maximum number of attribute names for each augmented manifest file 5
Maximum length of attribute name 63 characters
Real-time (synchronous) analysis

This section describes quotas related to real-time analysis of semi-structured documents.

The following table shows the maximum file sizes for input documents. For all input document types, the input file maximum is one page, with no more than 10,000 characters.

File type Maximum size (API) Maximum size (console)
UTF-8 text documents 10 KB 10 KB
PDF documents 10 MB 5 MB
Word documents 10 MB 5 MB
Image files 10 MB 5 MB
Textract output files 1 MB n/a
Asynchronous analysis

This section describes quotas for asynchronous analysis of semi-structured documents.

Description Quota/Guideline
Image size (JPG or PNG) 1 byte–10 MB
Image size (TIFF) 1 byte–10 MB. Maximum one page.
Document size (PDF) 1 byte–50 MB
Document size (Docx) 1 byte–5 MB
Document size (UTF-8) 1 byte–1 MB
Maximum number of files, one document per file (one document per line not allowed for image files or PDF/Word documents) 500
Maximum number of pages for a PDF or Docx file 100
Document corpus size after text extraction (plaintext, all files combined) 1 byte–5 GB

For more information about limits for images, see Hard Limits in Amazon Textract

Quotas for flywheels

Use flywheels to manage training and tracking of custom model versions for custom classification and custom entity recognition. For more information about Flywheels, see Flywheels.

General quotas for flywheels

The follow quotas apply to flywheels and flywheel iterations.

Description Quota/Guideline
Maximum number of flywheels 50
Maximum number of flywheels in CREATING state 10
Maximum number of training datasets per flywheel 50
Maximum number of test datasets per flywheel 50
Maximum number of datasets with INGESTING status 10
Maximum number of in-progress flywheel iterations per account 10

Dataset quotas for custom classification models

When you ingest a dataset for a flywheel associated with a custom classification model, the following quotas apply.

Description Quota/Guideline
Minimum number of training documents per class (multi-label mode) 50
Maximum number of training documents 1,000,000
Minimum dataset size 500 bytes
Maximum dataset size 5 GB
Maximum file size for one file, one document per file 10 MB

Dataset quotas for custom entity recognition models

When you ingest a dataset for a flywheel associated with a custom entity recognition model, the following quotas apply.

Description Quota/Guideline
Maximum document size 5 KB
Minimum number of training documents 3
Maximum number of training documents 200,000
Minimum number of annotations per entity 25
Maximum dataset size 200 MB