Tutorial: Building a metadata-enriched, intelligent search solution with Amazon Kendra - Amazon Kendra

Tutorial: Building a metadata-enriched, intelligent search solution with Amazon Kendra

This tutorial shows you how to build a metadata-enriched, natural language based, intelligent search solution for your enterprise data using Amazon Kendra, Amazon Comprehend, Amazon Simple Storage Service (S3), and AWS CloudShell.

Amazon Kendra is an intelligent search service that can build a search index for your unstructured, natural language data repositories. To make it easier for your customers to find and filter relevant answers, you can use Amazon Comprehend to extract metadata from your data and ingest it into your Amazon Kendra search index.

Amazon Comprehend is a natural language processing (NLP) service that can identify entities. Entities are references to people, places, locations, organizations, and objects in your data.

This tutorial uses a sample dataset of news articles to extract entities, convert them to metadata, and ingest them into your Amazon Kendra index to run searches on. The added metadata lets you filter your search results using any subset of these entities, and improves search accuracy. By following this tutorial, you will learn how to create a search solution for your enterprise data without any specialized machine learning knowledge.

This tutorial shows you how to build your search solution using the following steps:

  1. Storing a sample dataset of news articles in Amazon S3.

  2. Using Amazon Comprehend to extract entities from your data.

  3. Running a Python 3 script to convert the entities into Amazon Kendra index metadata format and storing this metadata in S3.

  4. Creating an Amazon Kendra search index and ingesting the data and the metadata.

  5. Querying the search index.

The following diagram shows the workflow:

Workflow diagram of the procedures in the tutorial.

Estimated time to complete this tutorial: 1 hour

Estimated cost: Some of the actions in this tutorial incur charges on your AWS account. For more information on the cost of each service, see the price pages for Amazon S3, Amazon Comprehend, AWS CloudShell, and Amazon Kendra.

Prerequisites

To complete this tutorial, you need the following resources:

  • An AWS account. If you do not have an AWS account, follow the steps in Setting up Amazon Kendra to set up your AWS account.

  • A development computer running Windows, macOS, or Linux, to access the AWS Management Console. For more information, see Configuring the AWS Management Console.

  • An AWS Identity and Access Management (IAM) user. To learn how to set up an IAM user and group for your account, see the Getting Started section in the IAM User Guide.

    If you are using the AWS Command Line Interface, you also need to attach the following policy to your IAM user to grant it the basic permissions required to complete this tutorial.

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "iam:GetUserPolicy", "iam:DeletePolicy", "iam:CreateRole", "iam:AttachRolePolicy", "iam:DetachRolePolicy", "iam:AttachUserPolicy", "iam:DeleteRole", "iam:CreatePolicy", "iam:GetRolePolicy", "s3:CreateBucket", "s3:ListBucket", "s3:DeleteObject", "s3:DeleteBucket", "s3:PutObject", "s3:GetObject", "s3:ListAllMyBuckets", "comprehend:StartEntitiesDetectionJob", "comprehend:BatchDetectEntities", "comprehend:ListEntitiesDetectionJobs", "comprehend:DescribeEntitiesDetectionJob", "comprehend:StopEntitiesDetectionJob", "comprehend:DetectEntities", "kendra:Query", "kendra:StopDataSourceSyncJob", "kendra:CreateDataSource", "kendra:BatchPutDocument", "kendra:DeleteIndex", "kendra:StartDataSourceSyncJob", "kendra:CreateIndex", "kendra:ListDataSources", "kendra:UpdateIndex", "kendra:DescribeIndex", "kendra:DeleteDataSource", "kendra:ListIndices", "kendra:ListDataSourceSyncJobs", "kendra:DescribeDataSource", "kendra:BatchDeleteDocument" ], "Resource": "*" }, { "Sid": "iamPassRole", "Effect": "Allow", "Action": "iam:PassRole", "Resource": "*", "Condition": { "StringEquals": { "iam:PassedToService": [ "s3.amazonaws.com", "comprehend.amazonaws.com", "kendra.amazonaws.com" ] } } } ] }

    For more information, see Creating IAM policies and Adding and removing IAM identity permissions.

  • The AWS Regional Services List. To reduce latency, you should choose the AWS region closest to your geographic location that is supported by both Amazon Comprehend and Amazon Kendra.

  • (Optional) An AWS Key Management Service. While this tutorial does not use encryption, you might want to use encryption best practices for your specific use case.

  • (Optional) An Amazon Virtual Private Cloud. While this tutorial does not use a VPC, you might want to use VPC best practices to ensure data security for your specific use case.