Train custom classifiers (API)
To create and train a custom classifier, use the CreateDocumentClassifier operation.
You can monitor the progress of the request using the DescribeDocumentClassifier operation. After
the Status
field transitions to TRAINED
, you can use the classifier to classify documents.
If the status is TRAINED_WITH_WARNINGS
, review the skipped files folder in the Classifier training output from the
CreateDocumentClassifier
operation.
Topics
Training custom classification using the AWS Command Line Interface
The following examples show how to use the CreateDocumentClassifier
operation, the
DescribeDocumentClassificationJob
operation, and other custom classifier APIs with the AWS CLI.
The examples are formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\) Unix continuation character at the end of each line with a caret (^).
Create a plain-text custom classifier using the create-document-classifier
operation.
aws comprehend create-document-classifier \ --region
region
\ --document-classifier-name testDelete \ --language-code en \ --input-data-config S3Uri=s3://S3Bucket
/docclass/file name
\ --data-access-role-arn arn:aws:iam::account number
:role/testFlywheelDataAccess
To create a native custom classifier, provide the following additional parameters in the
create-document-classifier
request.
-
DocumentType: set the value to SEMI_STRUCTURED_DOCUMENT.
-
Documents: the S3 location for the training documents (and, optionally, the test documents).
-
OutputDataConfig: provide the S3 location for the output documents (and an optional KMS key).
-
DocumentReaderConfig: Optional field for text extraction settings.
aws comprehend create-document-classifier \ --region
region
\ --document-classifier-name testDelete \ --language-code en \ --input-data-config S3Uri=s3://S3Bucket
/docclass/file name
\ DocumentType \ Documents \ --output-data-config S3Uri=s3://S3Bucket
/docclass/file name
\ --data-access-role-arn arn:aws:iam::account number
:role/testFlywheelDataAccess
Get information on a custom classifier with the document classifier ARN using the
DescribeDocumentClassifier
operation.
aws comprehend describe-document-classifier \ --region
region
\ --document-classifier-arn arn:aws:comprehend:region
:account number
:document-classifier/file name
Delete a custom classifier using the DeleteDocumentClassifier
operation.
aws comprehend delete-document-classifier \ --region
region
\ --document-classifier-arn arn:aws:comprehend:region
:account number
:document-classifier/testDelete
List all custom classifiers in the account using the
ListDocumentClassifiers
operation.
aws comprehend list-document-classifiers --region
region
Using the AWS SDK for Java or SDK for Python
For SDK examples of how to create and train a custom classifier , see Use CreateDocumentClassifier with an AWS SDK or CLI.