Running asynchronous jobs - Amazon Comprehend

Running asynchronous jobs

After you train a custom classifier, you can use asynchronous jobs to analyze large documents or multiple documents in one batch.

Custom classification accepts a variety of input document types. For details, see Inputs for asynchronous custom analysis.

If you plan to analyze image files or scanned PDF documents, your IAM policy must grant permissions to use two Amazon Textract API methods (DetectDocumentText and AnalyzeDocument). Amazon Comprehend invokes these methods during text extraction. For an example policy, see Permissions required to perform document analysis actions.

For classification of semi-structured documents (image, PDF, or Docx files) using a plain-text model, use the one document per file input format. Also, include the DocumentReaderConfig parameter in your StartDocumentClassificationJob request.