Amazon Comprehend
Developer Guide

Using Custom Classification

To create and train a custom classifier, use the Amazon Comprehend CreateDocumentClassifier. To identify custom classifiers in a corpus of documents, use the StartDocumentClassificationJob operation.

Using Custom Classification Using the AWS Command Line Interface

The following examples demonstrates using the CreateDocumentClassifier operation, StartDocumentClassificationJob operation, and other custom classifier APIs with the AWS CLI.

The example is formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\) Unix continuation character at the end of each line with a caret (^).

Creating a custom classifier using the create-document-classifier operation.

aws comprehend create-document-classifier \ --region region \ --document-classifier-name testDelete \ --language-code en \ --input-data-config S3Uri=s3://S3Bucket/docclass/file name \ --data-access-role-arn arn:aws:iam::account number:role/testDeepInsightDataAccess

Getting information on a custom classifier with the document classifier arn using the DescribeDocumentClassifier operation.

aws comprehend describe-document-classifier \ --region region \ --document-classifier-arn arn:aws:comprehend:region:account number:document-classifier/file name

Deleting a custom classifier using the DeleteDocumentClassifier operation.

aws comprehend delete-document-classifier \ --region region \ --document-classifier-arn arn:aws:comprehend:region:account number:document-classifier/testDelete

List all custom classifiers in the account using the ListDocumentClassifiers operation.

aws comprehend list-document-classifiers --region region

Run a custom classification job using the StartDocumentClassificationJob operation.

aws comprehend start-document-classification-job \ --region region \ --document-classifier-arn arn:aws:comprehend:region:account number:document-classifier/testDelete \ --input-data-config S3Uri=s3://S3Bucket/docclass/file name,InputFormat=ONE_DOC_PER_LINE \ --output-data-config S3Uri=s3://S3Bucket/output \ --data-access-role-arn arn:aws:iam::account number:role/resource name

Getting information on a custom classifier with the job id using the DescribeDocumentClassificationJob operation.

aws comprehend describe-document-classification-job \ --region region \ --job-id job id

Listing all custom classification jobs in your account using the ListDocumentClassificationJobs operation.

aws comprehend list-document-classification-jobs --region region

Using Custom Classification Using the AWS SDK for Java

This example creates a custom classifier and trains it using Java

import com.amazonaws.services.comprehend.AmazonComprehend; import com.amazonaws.services.comprehend.AmazonComprehendClientBuilder; import com.amazonaws.services.comprehend.model.CreateDocumentClassifierRequest; import com.amazonaws.services.comprehend.model.CreateDocumentClassifierResult; import com.amazonaws.services.comprehend.model.DescribeDocumentClassifierRequest; import com.amazonaws.services.comprehend.model.DescribeDocumentClassifierResult; import com.amazonaws.services.comprehend.model.DocumentClassifierInputDataConfig; import com.amazonaws.services.comprehend.model.LanguageCode; import com.amazonaws.services.comprehend.model.ListDocumentClassifiersRequest; import com.amazonaws.services.comprehend.model.ListDocumentClassifiersResult; public class DocumentClassifierDemo { public static void main(String[] args) { final AmazonComprehend comprehendClient = AmazonComprehendClientBuilder.standard() .withRegion("us-west-2") .build(); final String dataAccessRoleArn = "arn:aws:iam::account number:role/resource name"; final CreateDocumentClassifierRequest createDocumentClassifierRequest = new CreateDocumentClassifierRequest() .withDocumentClassifierName("SampleCodeClassifier") .withDataAccessRoleArn(dataAccessRoleArn) .withLanguageCode(LanguageCode.En) .withInputDataConfig(new DocumentClassifierInputDataConfig() .withS3Uri("s3://S3Bucket/docclass/file name")); final CreateDocumentClassifierResult createDocumentClassifierResult = comprehendClient.createDocumentClassifier(createDocumentClassifierRequest); final String documentClassifierArn = createDocumentClassifierResult.getDocumentClassifierArn(); System.out.println("Document Classifier ARN: " + documentClassifierArn); final DescribeDocumentClassifierRequest describeDocumentClassifierRequest = new DescribeDocumentClassifierRequest() .withDocumentClassifierArn(documentClassifierArn); final DescribeDocumentClassifierResult describeDocumentClassifierResult = comprehendClient.describeDocumentClassifier(describeDocumentClassifierRequest); System.out.println("DescribeDocumentClassifierResult: " + describeDocumentClassifierResult); final ListDocumentClassifiersRequest listDocumentClassifiersRequest = new ListDocumentClassifiersRequest(); final ListDocumentClassifiersResult listDocumentClassifiersResult = comprehendClient .listDocumentClassifiers(listDocumentClassifiersRequest); System.out.println("ListDocumentClassifierResult: " + listDocumentClassifiersResult ); } }

Using Custom Classification Using the AWS SDK for Python (Boto)

This example creates a custom classifier and trains it using Python

import boto3 # Instantiate Boto3 SDK: client = boto3.client('comprehend', region_name='region') # Create a document classifier create_response = client.create_document_classifier( InputDataConfig={ 'S3Uri': 's3://S3Bucket/docclass/file name' }, DataAccessRoleArn='arn:aws:iam::account number:role/resource name', DocumentClassifierName='SampleCodeClassifier1', LanguageCode='en' ) print("Create response: %s\n", create_response) # Check the status of the classifier describe_response = client.describe_document_classifier( DocumentClassifierArn=create_response['DocumentClassifierArn']) print("Describe response: %s\n", describe_response) # List all classifiers in account list_response = client.list_document_classifiers() print("List response: %s\n", list_response)

This example runs a custom classifier job using Python

import boto3 # Instantiate Boto3 SDK: client = boto3.client('comprehend', region_name='region') start_response = client.start_document_classification_job( InputDataConfig={ 'S3Uri': 's3://srikad-us-west-2-input/docclass/file name', 'InputFormat': 'ONE_DOC_PER_LINE' }, OutputDataConfig={ 'S3Uri': 's3://S3Bucket/output' }, DataAccessRoleArn='arn:aws:iam::account number:role/resource name', DocumentClassifierArn= 'arn:aws:comprehend:region:account number:document-classifier/SampleCodeClassifier1' ) print("Start response: %s\n", start_response) # Check the status of the job describe_response = client.describe_document_classification_job(JobId=start_response['JobId']) print("Describe response: %s\n", describe_response) # List all classification jobs in account list_response = client.list_document_classification_jobs() print("List response: %s\n", list_response)