Downloading the sample dataset Creating an Amazon S3 bucket Creating data and metadata folders in your S3 bucket Uploading the input data

Step 1: Adding documents to Amazon S3

Before you run an Amazon Comprehend entities analysis job on your dataset, you create an Amazon S3 bucket to host the data, metadata, and the Amazon Comprehend entities analysis output.

Topics

Downloading the sample dataset
Creating an Amazon S3 bucket
Creating data and metadata folders in your S3 bucket
Uploading the input data

Downloading the sample dataset

Before Amazon Comprehend can run an entities analysis job on your data, you must download and extract the dataset and upload it to an S3 bucket.

Download the tutorial-dataset.zip folder on your device.
Extract the tutorial-dataset folder to access the data folder.

To download the tutorial-dataset, run the following command on a terminal window:
Linux
```
curl -o path/tutorial-dataset.zip https://docs.aws.amazon.com/kendra/latest/dg/samples/tutorial-dataset.zip
```
Where:
path/ is the local filepath to the location you want to save the zip folder in.
macOS
```
curl -o path/tutorial-dataset.zip https://docs.aws.amazon.com/kendra/latest/dg/samples/tutorial-dataset.zip
```
Where:
path/ is the local filepath to the location you want to save the zip folder in.
Windows
```
curl -o path/tutorial-dataset.zip https://docs.aws.amazon.com/kendra/latest/dg/samples/tutorial-dataset.zip
```
Where:
path/ is the local filepath to the location you want to save the zip folder in.
To extract the data from the zip folder, run the following command on the terminal window:
Linux
```
unzip path/tutorial-dataset.zip -d path/
```
Where:
path/ is the local filepath to your saved zip folder.
macOS
```
unzip path/tutorial-dataset.zip -d path/
```
Where:
path/ is the local filepath to your saved zip folder.
Windows
```
tar -xf path/tutorial-dataset.zip -C path/
```
Where:
path/ is the local filepath to your saved zip folder.

At the end of this step, you should have the extracted files in a decompressed folder called tutorial-dataset. This folder contains a README file with an Apache 2.0 open source attribution and a folder called data containing the dataset for this tutorial. The dataset consists of 100 files with .story extensions.

Creating an Amazon S3 bucket

After downloading and extracting the sample data folder, you store it in an Amazon S3 bucket.

Important

The name of an Amazon S3 bucket must be unique across all of AWS.

Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.
In Buckets, choose Create bucket.
For Bucket name, enter a unique name.
For Region, choose the AWS region where you want to create the bucket.

Note
You must choose a region that supports both Amazon Comprehend and Amazon Kendra. You cannot change the region of a bucket after you have created it.
Keep the default settings for Block Public Access settings for this bucket, Bucket Versioning, and Tags.
For Default encryption, choose Disable.
Keep the default settings for the Advanced settings.
Review your bucket configuration and then choose Create bucket.

To create an S3 bucket, use the create-bucket command in the AWS CLI:
Linux
```
aws s3api create-bucket \
        --bucket DOC-EXAMPLE-BUCKET \
        --region aws-region \
        --create-bucket-configuration LocationConstraint=aws-region
```
Where:
DOC-EXAMPLE-BUCKET is your bucket name,

aws-region is the region you want to create your bucket in.
macOS
```
aws s3api create-bucket \
        --bucket DOC-EXAMPLE-BUCKET \
        --region aws-region \
        --create-bucket-configuration LocationConstraint=aws-region
```
Where:
DOC-EXAMPLE-BUCKET is your bucket name,

aws-region is the region you want to create your bucket in.
Windows
```
aws s3api create-bucket ^
        --bucket DOC-EXAMPLE-BUCKET ^
        --region aws-region ^
        --create-bucket-configuration LocationConstraint=aws-region
```
Where:
DOC-EXAMPLE-BUCKET is your bucket name,

aws-region is the region you want to create your bucket in.
Note
You must choose a region that supports both Amazon Comprehend and Amazon Kendra. You cannot change the region of a bucket after you have created it.
To ensure that your bucket was created successfully, use the list command:
Linux
```
aws s3 ls
```
macOS
```
aws s3 ls
```
Windows
```
aws s3 ls
```

Creating data and metadata folders in your S3 bucket

After creating your S3 bucket, you create data and metadata folders inside it.

Open the Amazon S3 console at https://console.aws.amazon.com/s3/.
In Buckets, click on the name of your bucket from the list of buckets.
From the Objects tab, choose Create folder.
For the new folder name, enter data.
For the encryption settings, choose Disable.
Choose Create folder.
Repeat steps 3 to 6 to create another folder for storing the Amazon Kendra metadata and name the folder created in step 4 metadata.

To create the data folder in your S3 bucket, use the put-object command in the AWS CLI:

To create the metadata folder in your S3 bucket, use the put-object command in the AWS CLI:

To ensure that your folders were created successfully, check the contents of your bucket using the list command:
Linux
```
aws s3 ls s3://DOC-EXAMPLE-BUCKET/
```
Where:
DOC-EXAMPLE-BUCKET is your bucket name.
macOS
```
aws s3 ls s3://DOC-EXAMPLE-BUCKET/
```
Where:
DOC-EXAMPLE-BUCKET is your bucket name.
Windows
```
aws s3 ls s3://DOC-EXAMPLE-BUCKET/
```
Where:
DOC-EXAMPLE-BUCKET is your bucket name.

Uploading the input data

After creating your data and metadata folders, you upload the sample dataset into the data folder.

Open the Amazon S3 console at https://console.aws.amazon.com/s3/.
In Buckets, click on the name of your bucket from the list of buckets and then click on data.
Choose Upload and then choose Add files.
In the dialog box, navigate to the data folder inside the tutorial-dataset folder in your local device, select all the files, and then choose Open.
Keep the default settings for Destination, Permissions, and Properties.
Choose Upload.

To upload the sample data into the data folder, use the copy command in the AWS CLI:
Linux
```
aws s3 cp path/tutorial-dataset/data s3://DOC-EXAMPLE-BUCKET/data/ --recursive
```
Where:
path/ is the filepath to the tutorial-dataset folder on your device,

DOC-EXAMPLE-BUCKET is your bucket name.
macOS
```
aws s3 cp path/tutorial-dataset/data s3://DOC-EXAMPLE-BUCKET/data/ --recursive
```
Where:
path/ is the filepath to the tutorial-dataset folder on your device,

DOC-EXAMPLE-BUCKET is your bucket name.
Windows
```
aws s3 cp path/tutorial-dataset/data s3://DOC-EXAMPLE-BUCKET/data/ --recursive
```
Where:
path/ is the filepath to the tutorial-dataset folder on your device,

DOC-EXAMPLE-BUCKET is your bucket name.
To ensure that your dataset files were uploaded successfully to your data folder, use the list command in the AWS CLI:
Linux
```
aws s3 ls s3://DOC-EXAMPLE-BUCKET/data/
```
Where:
DOC-EXAMPLE-BUCKET is the name of your S3 bucket.
macOS
```
aws s3 ls s3://DOC-EXAMPLE-BUCKET/data/
```
Where:
DOC-EXAMPLE-BUCKET is the name of your S3 bucket.
Windows
```
aws s3 ls s3://DOC-EXAMPLE-BUCKET/data/
```
Where:
DOC-EXAMPLE-BUCKET is the name of your S3 bucket.

At the end of this step, you have an S3 bucket with your dataset stored inside the data folder, and an empty metadata folder, which will store your Amazon Kendra metadata.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Tutorial: Building an intelligent search solution

Step 2: Detecting entities