Use CreateDocumentClassifier with an AWS SDK or command line tool - Amazon Comprehend

Use CreateDocumentClassifier with an AWS SDK or command line tool

The following code examples show how to use CreateDocumentClassifier.

Action examples are code excerpts from larger programs and must be run in context. You can see this action in context in the following code example:

CLI
AWS CLI

To create a document classifier to categorize documents

The following create-document-classifier example begins the training process for a document classifier model. The training data file, training.csv, is located at the --input-data-config tag. training.csv is a two column document where the labels, or, classifications are provided in the first column and the documents are provided in the second column.

aws comprehend create-document-classifier \ --document-classifier-name example-classifier \ --data-access-arn arn:aws:comprehend:us-west-2:111122223333:pii-entities-detection-job/123456abcdeb0e11022f22a11EXAMPLE \ --input-data-config "S3Uri=s3://DOC-EXAMPLE-BUCKET/" \ --language-code en

Output:

{ "DocumentClassifierArn": "arn:aws:comprehend:us-west-2:111122223333:document-classifier/example-classifier" }

For more information, see Custom Classification in the Amazon Comprehend Developer Guide.

Java
SDK for Java 2.x
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

import software.amazon.awssdk.regions.Region; import software.amazon.awssdk.services.comprehend.ComprehendClient; import software.amazon.awssdk.services.comprehend.model.ComprehendException; import software.amazon.awssdk.services.comprehend.model.CreateDocumentClassifierRequest; import software.amazon.awssdk.services.comprehend.model.CreateDocumentClassifierResponse; import software.amazon.awssdk.services.comprehend.model.DocumentClassifierInputDataConfig; /** * Before running this code example, you can setup the necessary resources, such * as the CSV file and IAM Roles, by following this document: * https://aws.amazon.com/blogs/machine-learning/building-a-custom-classifier-using-amazon-comprehend/ * * Also, set up your development environment, including your credentials. * * For more information, see the following documentation topic: * * https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html */ public class DocumentClassifierDemo { public static void main(String[] args) { final String usage = """ Usage: <dataAccessRoleArn> <s3Uri> <documentClassifierName> Where: dataAccessRoleArn - The ARN value of the role used for this operation. s3Uri - The Amazon S3 bucket that contains the CSV file. documentClassifierName - The name of the document classifier. """; if (args.length != 3) { System.out.println(usage); System.exit(1); } String dataAccessRoleArn = args[0]; String s3Uri = args[1]; String documentClassifierName = args[2]; Region region = Region.US_EAST_1; ComprehendClient comClient = ComprehendClient.builder() .region(region) .build(); createDocumentClassifier(comClient, dataAccessRoleArn, s3Uri, documentClassifierName); comClient.close(); } public static void createDocumentClassifier(ComprehendClient comClient, String dataAccessRoleArn, String s3Uri, String documentClassifierName) { try { DocumentClassifierInputDataConfig config = DocumentClassifierInputDataConfig.builder() .s3Uri(s3Uri) .build(); CreateDocumentClassifierRequest createDocumentClassifierRequest = CreateDocumentClassifierRequest.builder() .documentClassifierName(documentClassifierName) .dataAccessRoleArn(dataAccessRoleArn) .languageCode("en") .inputDataConfig(config) .build(); CreateDocumentClassifierResponse createDocumentClassifierResult = comClient .createDocumentClassifier(createDocumentClassifierRequest); String documentClassifierArn = createDocumentClassifierResult.documentClassifierArn(); System.out.println("Document Classifier ARN: " + documentClassifierArn); } catch (ComprehendException e) { System.err.println(e.awsErrorDetails().errorMessage()); System.exit(1); } } }
Python
SDK for Python (Boto3)
Note

There's more on GitHub. Find the complete example and learn how to set up and run in the AWS Code Examples Repository.

class ComprehendClassifier: """Encapsulates an Amazon Comprehend custom classifier.""" def __init__(self, comprehend_client): """ :param comprehend_client: A Boto3 Comprehend client. """ self.comprehend_client = comprehend_client self.classifier_arn = None def create( self, name, language_code, training_bucket, training_key, data_access_role_arn, mode, ): """ Creates a custom classifier. After the classifier is created, it immediately starts training on the data found in the specified Amazon S3 bucket. Training can take 30 minutes or longer. The `describe_document_classifier` function can be used to get training status and returns a status of TRAINED when the classifier is ready to use. :param name: The name of the classifier. :param language_code: The language the classifier can operate on. :param training_bucket: The Amazon S3 bucket that contains the training data. :param training_key: The prefix used to find training data in the training bucket. If multiple objects have the same prefix, all of them are used. :param data_access_role_arn: The Amazon Resource Name (ARN) of a role that grants Comprehend permission to read from the training bucket. :return: The ARN of the newly created classifier. """ try: response = self.comprehend_client.create_document_classifier( DocumentClassifierName=name, LanguageCode=language_code, InputDataConfig={"S3Uri": f"s3://{training_bucket}/{training_key}"}, DataAccessRoleArn=data_access_role_arn, Mode=mode.value, ) self.classifier_arn = response["DocumentClassifierArn"] logger.info("Started classifier creation. Arn is: %s.", self.classifier_arn) except ClientError: logger.exception("Couldn't create classifier %s.", name) raise else: return self.classifier_arn

For a complete list of AWS SDK developer guides and code examples, see Using Amazon Comprehend with an AWS SDK. This topic also includes information about getting started and details about previous SDK versions.