Automated deployment - Text Analysis with Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) and Amazon Comprehend

Automated deployment

Before you launch the automated deployment, please review the considerations discussed in this guide. Follow the step-by-step instructions in this section to configure and deploy the Text Analysis with Amazon OpenSearch Service and Amazon Comprehend solution into your account.

Time to deploy: Approximately 20 minutes

What we’ll cover

The procedure for deploying this architecture on AWS consists of the following steps. For detailed instructions, follow the links for each step.

Step 1. Launch the Stack

  • Launch the AWS CloudFormation template into your AWS account.

  • Enter values for required parameter: Stack Name, Domain Name

  • Review the other template parameters, and adjust if necessary.

Step 2. Index Documents Using the Amazon OpenSearch Service Proxy API

  • Configure and index sample documents

Step 3. Open the Pre-Configured Kibana Dashboard

  • View the Kibana dashboard

Step 1. Launch the stack

This automated AWS CloudFormation template deploys the Text Analysis with Amazon OpenSearch Service and Amazon Comprehend solution in the AWS Cloud.

Note

You are responsible for the cost of the AWS services used while running this solution. Refer to the Cost section for more details. For full details, refer to the pricing webpage for each AWS service you will be using in this solution.

  1. Sign in to the AWS Management Console and click the button to the right to launch the text-analysis-with-amazon-opensearch-service-and-amazon-comprehend AWS CloudFormation template.

    
                text-analysis-with-amazon-opensearch-service-and-amazon-comprehend.template launch button

    You can also download the template as a starting point for your own implementation.

  2. The template is launched in the US East (N. Virginia) Region by default. To launch the solution in a different AWS Region, use the region selector in the console navigation bar.

    Note

    This solution uses the Amazon Comprehend service, which is currently available in specific AWS Regions only. Therefore, you must launch this solution in an AWS Region where Amazon Comprehend is available. For the most current availability by Region, refer to AWS service offerings by Region.

  3. On the Create stack page, verify that the correct template URL shows in the Amazon S3 URL text box and choose Next.

  4. On the Specify stack details page, assign a name to your solution stack.

  5. Under Parameters, review the parameters for the template and modify them as necessary. This solution uses the following default values.

    Parameter Default Description
    Domain Name <Requires Input> The name of the Amazon OpenSearch Service domain that this template will create.
    Instance Type m4.large.search The instance type for Amazon OpenSearch Service.
    Number of Instances 2 The number of Amazon OpenSearch Service cluster instances the template will create.
    Enable VPC false Choose whether to enable Amazon VPC for Lambda and Amazon OpenSearch Service.
    IP to Access Kibana Dashboard 10.0.0.0/16

    Enter an IP address or CIDR block to access the Kibana dashboard. You can find your IP address here.

    Note

    If you choose not to enter your IP address, you will be denied access to the Kibana dashboard after you launch the solution.

    API Gateway Authorization Type NONE

    Choose the authorization type of the API Gateway for access control. Choose NONE or AWS_IAM.

    Note

    If you select NONE, the API will be accessible without any authorization. If you select AWS_IAM, the request must be signed using IAM users or roles who have permission to access proxy APIs.

    VPC CIDR Block 10.0.0.0/16 Enter the CIDR block for the VPC. This value will not be used if you selected false to EnableVPC.
    Public Subnet01 Block 10.0.0.0/24 Enter the CIDR block for public subnet1 located in AZ1. This value will not be used if you selected false for EnableVPC.
    Public Subnet02 Block 10.0.1.0/24 Enter the CIDR block for public subnet2 located in AZ2. This value will not be used if you selected false for EnableVPC.
    Private Subnet01 Block 10.0.2.0/24 Enter the CIDR block for private subnet1 located in AZ1. This value will not be used if you selected false for EnableVPC.
    Private Subnet02 Block 10.0.3.0/24 Enter the CIDR block for public subnet2 located in AZ2. This value will not be used if you selected false for EnableVPC.
    Enable Encryption at Rest true Choose whether to enable Amazon OpenSearch Service domain encryption at rest.
    Enable Node to Node Encryption true Choose whether to enable Amazon OpenSearch Service node to node encryption.
    Open Search Service Enable EBS true Choose whether to disable an EBS storage type for Amazon OpenSearch Service.
    EBS Volume Type standard The EBS volume type for the Amazon OpenSearch Service cluster.
    EBS Volume Size 10 The Amazon OpenSearch Service EBS storage size in GBs per node.
    Enable Dedicated Master true Choose whether to enable dedicated master node for the Amazon OpenSearch Service cluster.
    Dedicated Master Count 3 The number of dedicated master nodes for the Amazon OpenSearch Service cluster.
    Dedicated Master Type m4.large.search The instance type for Amazon OpenSearch Service cluster master node.
    Enable Zone Awareness true Choose whether to enable zone awareness for the Amazon OpenSearch Service cluster.
    Stage prod Enter the stage name of the API Gateway.
    Open Search Service Role Exists false

    Choose whether the Service-Linked Role for the OpenSearch VPC already exists.

    Note

    Set the parameter to true if you have an already created VPC access for the Amazon OpenSearch Service domain. For more information, refer to Service-Linked Roles.

  6. Choose Next.

  7. On the Configure stack options page, choose Next.

  8. On the Review page, review and confirm the settings. Be sure to check the box acknowledging that the template will create AWS Identity and Access Management (IAM) resources.

  9. Choose Create stack to deploy the stack.

You can view the status of the stack in the AWS CloudFormation Console in the Status column. You should see a status of CREATE_COMPLETE in approximately 20 minutes.

Step 2. Index documents using the Amazon OpenSearch Service proxy API

When the solution has successfully deployed, you can begin creating the preprocessing configuration, and indexing and searching documents using proxy API. Use the following procedure to begin indexing documents.

Find the proxy endpoint from AWS CloudFormation output

  1. In the AWS CloudFormation console, navigate to the stack Outputs tab.

  2. Find and copy the proxy endpoint value of the ProxyEndpoint key.

Setup preprocessing configurations

In your terminal window, use the following example code to setup preprocessing configurations.

curl -XPUT <proxy_endpoint>/preprocessing_configurations -d '{ "comprehendConfigurations": [ { "indexName": "news", "fieldName": "content", "comprehendOperations": [ "DetectSentiment", "DetectEntities", "DetectKeyPhrases", "DetectDominantLanguage", "DetectSyntax" ], "languageCode": "en" } ] } '

Upload a single document

In your terminal window, use the following example code to upload a single document.

curl -XPUT <proxy_endpoint>/news/_doc/1 -H 'Content-Type: application/json' -d ' { "content": "Amazon.com, Inc., is an American multinational technology company based in Seattle, Washington that focuses on e-commerce, cloud computing, digital streaming, and artificial intelligence. It is considered one of the Big Four technology companies along with Google, Apple, and Facebook" } '

Upload documents in bulk

In your terminal window, use the following example bulk documents code, and save it as bulk_news.json file.

{ "index" : { "_index": "news", "_type" : "_doc", "_id" : "2" } } {"content": "Alice does not like the rainy day"} { "index" : { "_index": "news", "_type" : "_doc", "_id" : "3" } } {"content": " I love living in New York City"} { "index" : { "_index": "news", "_type" : "_doc", "_id" : "4" } } {"content": " Bob hates that movie"}

Then, run the following command to index the bulk_news.json file:

curl -XPOST <proxy_endpoint>/_bulk --data-binary @bulk_news.json -H 'Content-Type: application/json'

Search documents

Once documents are indexed into the Amazon OpenSearch Service domain, you can search the document using the extended Amazon Comprehend fields.

Run the following command to search for document with positive sentiment:

curl -XGET <proxy_endpoint>/news/_search?pretty -H 'Content-Type: application/json' -d ' { "query" : { "bool" : { "must" : [ { "match" : {"content_DetectSentiment.sentiment" : "POSITIVE"} } ] } } } '

Run the following command to search for documents with negative sentiment:

curl -XGET <proxy_endpoint>/news/_search?pretty -H 'Content-Type: application/json' -d ' { "query" : { "bool" : { "must" : [ { "match" : {"content_DetectSentiment.sentiment" : "NEGATIVE"} } ] } } } '

Run the following command to search for documents containing the entity type LOCATION where the score is more than 0.9:

curl -XGET <proxy_endpoint>/news/_search -H 'Content-Type: application/json' -d ' { "query": { "nested" : { "path" : "content_DetectEntities.entities", "query" : { "bool" : { "must" : [ { "match" : {"content_DetectEntities.entities.type" : "LOCATION"} }, { "range" : {"content_DetectEntities.entities.score" : {"gt" : 0.9}} } ] } }, "score_mode" : "avg" } } } '

Step 3. Open the pre-configured Kibana dashboard

The solution comes with a pre-configured Kibana dashboard powered by the data provided by Amazon Comprehend. The dashboard is pre-loaded as part of the PUT /preprocessing_configuration API. When data is indexed, you will be able to view a dashboard for each index and field name combination. Use the following procedure to view the indexed documents results.

  1. In the AWS CloudFormation console, navigate to the stack Outputs tab.

  2. Select the KibanaDashboardURL, and switch to dashboard view to review indexed results.

Example entity Kibana dashboard view

Figure 6: Example entity Kibana dashboard view

Note

AWS recommends indexing documents before accessing the pre-configured Kibana dashboard.

Use the following procedure to verify that the Kibana dashboard is configured correctly and indexed document fields are detected.

  1. Navigate to the Kibana dashboard, and select the Management tab.

  2. Select Index Patterns, and select the corresponding index pattern.

  3. Then, select the refresh field list button on the top right.

Kibana dashboard index pattern

Figure 7: Kibana dashboard index pattern