Step 2: Running an entities analysis job on Amazon Comprehend
After storing the sample dataset in your S3 bucket, you run an Amazon Comprehend entities analysis job to extract entities from your documents. These entities will form Amazon Kendra custom attributes and help you filter search results on your index. For more information, see Detect Entities.
Running an Amazon Comprehend entities analysis job
To extract entities from your dataset, you run an Amazon Comprehend entities analysis job.
If you are using the AWS CLI in this step, you first create and attach an AWS IAM role and policy for Amazon Comprehend and then run an entities analysis job. To run an entities analysis job on your sample data, Amazon Comprehend needs:
-
an AWS Identity and Access Management (IAM) role that recognizes it as a trusted entity
-
an AWS IAM policy attached to the IAM role that gives it permissions to access your S3 bucket
For more information, see How Amazon Comprehend works with IAM and Identity-Based Policies for Amazon Comprehend.
Open the Amazon Comprehend console at https://console.aws.amazon.com/comprehend/
. Important
Ensure that you are in the same region in which you created your Amazon S3 bucket. If you are in another region, choose the AWS region where you created your S3 bucket from the Region selector in the top navigation bar.
-
Choose Launch Amazon Comprehend.
-
In the left navigation pane, choose Analysis jobs.
-
Choose Create job.
-
In the Job settings section, do the following:
-
For Name, enter
data-entities-analysis
. -
For Analysis type, choose Entities.
-
For Language, choose English.
-
Keep Job encryption turned off.
-
-
In the Input data section, do the following:
-
For Data source, choose My documents.
-
For S3 location, choose Browse S3.
-
For Choose resources, click on the name of your bucket from the list of buckets.
-
For Objects, select the option button for
data
and choose Choose. -
For Input format, choose One document per file.
-
-
In the Output data section, do the following:
-
For S3 location, choose Browse S3 and then select the option box for your bucket from the list of buckets and choose Choose.
-
Keep Encryption turned off.
-
-
In the Access permissions section, do the following:
-
For IAM role, choose Create an IAM role.
-
For Permissions to access, choose Input and Output S3 buckets.
-
For Name suffix, enter
comprehend-role
. This role provides access to your Amazon S3 bucket.
-
-
Keep the default VPC settings.
-
Choose Create job.
-
To create and attach an IAM role for Amazon Comprehend that recognizes it as a trusted entity, do the following:
-
Save the following trust policy as a JSON file called
comprehend-trust-policy.json
in a text editor on your local device.{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "comprehend.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }
-
To create an IAM role called
comprehend-role
and attach your savedcomprehend-trust-policy.json
file to it, use the create-rolecommand: -
Copy the Amazon Resource Name (ARN) to your text editor and save it locally as
comprehend-role-arn
.Note
The ARN has a format similar to
arn:aws:iam::123456789012:role/comprehend-role
. You need the ARN you saved ascomprehend-role-arn
to run the Amazon Comprehend analysis job.
-
-
To create and attach an IAM policy to your IAM role that grants it permissions to access your S3 bucket, do the following:
-
Save the following trust policy as a JSON file called
comprehend-S3-access-policy.json
in a text editor on your local device.{ "Version": "2012-10-17", "Statement": [ { "Action": [ "s3:GetObject" ], "Resource": [ "arn:aws:s3:::amzn-s3-demo-bucket/*" ], "Effect": "Allow" }, { "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::amzn-s3-demo-bucket" ], "Effect": "Allow" }, { "Action": [ "s3:PutObject" ], "Resource": [ "arn:aws:s3:::amzn-s3-demo-bucket/*" ], "Effect": "Allow" } ] }
-
To create an IAM policy called
comprehend-S3-access-policy
to access your S3 bucket, use the create-policycommand: -
Copy the Amazon Resource Name (ARN) to your text editor and save it locally as
comprehend-S3-access-arn
.Note
The ARN has a format similar to
arn:aws:iam::123456789012:role/comprehend-S3-access-policy
. You need the ARN you saved ascomprehend-S3-access-arn
to attach thecomprehend-S3-access-policy
to your IAM role. -
To attach the
comprehend-S3-access-policy
to your IAM role, use the attach-role-policycommand:
-
-
To run an Amazon Comprehend entities analysis job, use the start-entities-detection-job
command: -
Copy the entities analysis
JobId
and save it in a text editor ascomprehend-job-id
. TheJobId
helps you track the status of your entities analysis job. -
To track the progress of your entities analysis job, use the describe-entities-detection-job
command:
It can take several minutes for the JobStatus
to change to
COMPLETED
.
At the end of this step, Amazon Comprehend stores the entity analysis results as a zipped
output.tar.gz
file inside an output
folder
within an auto-generated folder in your S3 bucket. Make sure that your analysis job status
is complete before you move on to the next step.