Training a Custom Classifier - Amazon Comprehend

Training a Custom Classifier

To train a custom classifier follow these steps.

  1. Provide your training data with the custom classes that you want the classifier to recognize.

  2. Put your training data in an Amazon Simple Storage Service (Amazon S3) bucket. The bucket must have AWS Identity and Access Management (IAM) Amazon Comprehend read permissions that allow Amazon Comprehend access. For more information, see Role-Based Permissions Required for Asynchronous Operations. Create another S3 bucket with the same requirements for your output.

  3. Submit a training job using the CreateDocumentClassifier operation.

Train your custom classifier model in either multi-class or multi-label mode. The concept of class is used for both modes. It's a custom category that applies to the document being analyzed. However, each mode uses class differently. Multi-class mode associates only a single class with each document. Multi-label mode associates more than one class with a document. The training data formats are different for each mode as well.

You can train a custom classifier by using any of the following languages that work with Amazon Comprehend: English, Spanish, German, Italian, French, or Portuguese. However, you can only train the classifier in one language. Classifiers do not support multiple languages.

After you ask Amazon Comprehend to create a custom classifier, you can monitor the progress of the request using the DescribeDocumentClassifier operation. Once the Status field is TRAINED you can then use the classifier to classify documents.