Real-time analysis for custom classification (API)
You can use the Amazon Comprehend API to run real-time classification with a custom model. First, you create an endpoint to run the real-time analysis. After you create the endpoint, you run the real-time classification.
The examples in this section use command formats for Unix, Linux, and macOS. For Windows, replace the backslash (\) Unix continuation character at the end of each line with a caret (^).
For information about provisioning endpoint throughput, and the associated costs, see Using Amazon Comprehend endpoints.
Creating an endpoint for custom classification
The following example shows the CreateEndpoint API operation using the AWS CLI.
aws comprehend create-endpoint \ --desired-inference-units
number of inference units
\ --endpoint-nameendpoint name
\ --model-arn arn:aws:comprehend:region
:account-id
:model/example
\ --tags Key=My1stTag
,Value=Value1
Amazon Comprehend responds with the following:
{ "EndpointArn": "
Arn
" }
Running real-time custom classification
After you create an endpoint for your custom classification model, you use the endpoint to run
the ClassifyDocument API operation. You can provide text input using the text
or bytes
parameter. Enter the other input types using the bytes
parameter.
For image files and PDF files, you can use the DocumentReaderConfig
parameter to override the default text extraction actions.
For details, see Setting text extraction options
For best results, match the type of input to the classifier model type. The API response includes a warning if you submit a native document to a plain-text model, or a plain-text file to a native document model. For more information, see Training classification models.
Using the AWS Command Line Interface
The following examples demonstrate how to use the classify-document CLI command.
Classify text using the AWS CLI
The following example runs real-time classification on a block of text.
aws comprehend classify-document \ --endpoint-arn arn:aws:comprehend:
region
:account-id
:endpoint/endpoint name
\ --text 'From the Tuesday, April 16th, 1912 edition of The Guardian newspaper: The maiden voyage of the White Star liner Titanic, the largest ship ever launched ended in disaster. The Titanic started her trip from Southampton for New York on Wednesday. Late on Sunday night she struck an iceberg off the Grand Banks of Newfoundland. By wireless telegraphy she sent out signals of distress, and several liners were near enough to catch and respond to the call.'
Amazon Comprehend responds with the following:
{ "Classes": [ { "Name": "string", "Score": 0.9793661236763 } ] }
Classify a semi-structured document using the AWS CLI
To analyze custom classification for a PDF, Word, or image file, run the classify-document
command with the input file in the bytes
parameter.
The following example uses an image as the input file. It uses the fileb
option to base-64 encode the
image file bytes. For more information, see Binary large
objects in the AWS Command Line Interface User Guide.
This example also passes in a JSON file named config.json
to set the text extraction
options.
$
aws comprehend classify-document \
>
--endpoint-arn
arn
\
>
--language-code
en
\
>
--bytes
fileb://image1.jpg\
>
--document-reader-config file://config.json
The config.json file contains the following content.
{ "DocumentReadMode": "FORCE_DOCUMENT_READ_ACTION", "DocumentReadAction": "TEXTRACT_DETECT_DOCUMENT_TEXT" }
Amazon Comprehend responds with the following:
{ "Classes": [ { "Name": "string", "Score": 0.9793661236763 } ] }
For more information, see ClassifyDocument in the Amazon Comprehend API Reference.