下载示例数据集创建 Amazon S3 存储桶在 S3 存储桶中创建数据和元数据文件夹上传输入数据

步骤 1：向 Amazon S3 添加文档

在对数据集运行 Amazon Comprehend 实体分析任务之前，您需要创建一个 Amazon S3 存储桶来托管数据、元数据和 Amazon Comprehend 实体分析输出。

主题

下载示例数据集
创建 Amazon S3 存储桶
在 S3 存储桶中创建数据和元数据文件夹
上传输入数据

下载示例数据集

在 Amazon Comprehend 可以对您的数据运行实体分析任务之前，您必须下载并提取数据集并将其上传到 S3 存储桶。

在您的设备上下载 tutorial-dataset.zip 文件夹。
解压tutorial-dataset文件夹以访问 data 文件夹。

要下载 tutorial-dataset，请在终端窗口中运行以下命令：

要从 zip 文件夹中提取数据，请在终端窗口中运行以下命令：
Linux
```
unzip path/tutorial-dataset.zip -d path/
```
其中：
path/是您保存的 zip 文件夹的本地文件路径。
macOS
```
unzip path/tutorial-dataset.zip -d path/
```
其中：
path/是您保存的 zip 文件夹的本地文件路径。
Windows
```
tar -xf path/tutorial-dataset.zip -C path/
```
其中：
path/是您保存的 zip 文件夹的本地文件路径。

在此步骤结束时，您应该将解压缩的文件放在名为 tutorial-dataset 的解压缩文件夹中。此文件夹包含一个带有 Apache 2.0 开源属性的 README 文件和一个名为 data 的文件夹，其中包含本教程的数据集。该数据集由 100 个带有 .story 扩展名的文件组成。

创建 Amazon S3 存储桶

下载和提取示例数据文件夹后，您可以将其存储在 Amazon S3 存储桶中。

重要

在所有 AWS中，Amazon S3 存储桶的名称必须是唯一的。

登录 AWS Management Console 并打开 Amazon S3 控制台，网址为https://console.aws.amazon.com/s3/。
在存储桶中，选择创建存储桶。
对于 Bucket name（存储桶名称），请输入唯一名称。
对于区域，选择要在其中创建存储桶的 AWS 区域。

注意
您必须选择同时支持 Amazon Comprehend 和 Amazon Kendra 的区域。创建存储桶后无法更改其区域。
保留此存储桶的“阻止公共访问”设置、存储桶版本控制和标签的默认设置。
对于默认加密，请选择禁用。
保留高级设置的默认设置。
查看您的存储桶配置，然后选择创建存储桶。

要创建 S3 存储桶，请使用 AWS CLI中的 create-bucket 命令：
Linux
```
aws s3api create-bucket \
        --bucket amzn-s3-demo-bucket \
        --region aws-region \
        --create-bucket-configuration LocationConstraint=aws-region
```
其中：
amzn-s3-demo-bucket 是您的存储桶名称，

aws-region是您要在其中创建存储桶的区域。
macOS
```
aws s3api create-bucket \
        --bucket amzn-s3-demo-bucket \
        --region aws-region \
        --create-bucket-configuration LocationConstraint=aws-region
```
其中：
amzn-s3-demo-bucket 是您的存储桶名称，

aws-region是您要在其中创建存储桶的区域。
Windows
```
aws s3api create-bucket ^
        --bucket amzn-s3-demo-bucket ^
        --region aws-region ^
        --create-bucket-configuration LocationConstraint=aws-region
```
其中：
amzn-s3-demo-bucket 是您的存储桶名称，

aws-region是您要在其中创建存储桶的区域。
注意
您必须选择同时支持 Amazon Comprehend 和 Amazon Kendra 的区域。创建存储桶后无法更改其区域。
要确保您的存储桶已成功创建，请使用 list 命令：
Linux
```
aws s3 ls
```
macOS
```
aws s3 ls
```
Windows
```
aws s3 ls
```

在 S3 存储桶中创建数据和元数据文件夹

创建 S3 存储桶后，您可以在其中创建数据和元数据文件夹。

打开 Amazon S3 控制台，网址为 https://console.aws.amazon.com/s3/。
在存储桶中，单击存储桶列表中您的存储桶的名称。
从对象选项卡中，选择创建文件夹。
对于新文件夹名称，输入 data。
对于加密设置，选择禁用。
请选择 Create folder（创建文件夹）。
重复步骤 3 到 6，创建另一个用于存储 Amazon Kendra 元数据的文件夹，并命名步骤 4 metadata 中创建的文件夹。

要在您的 S3 存储桶中创建 data 文件夹，请在 AWS CLI中使用 put-object 命令：
Linux
```
aws s3api put-object \
        --bucket amzn-s3-demo-bucket \
        --key data/
```
其中：
amzn-s3-demo-bucket 是您的存储桶名称。
macOS
```
aws s3api put-object \
        --bucket amzn-s3-demo-bucket \
        --key data/
```
其中：
amzn-s3-demo-bucket 是您的存储桶名称。
Windows
```
aws s3api put-object ^
        --bucket amzn-s3-demo-bucket ^
        --key data/
```
其中：
amzn-s3-demo-bucket 是您的存储桶名称。
要在您的 S3 存储桶中创建 metadata 文件夹，请在 AWS CLI中使用 put-object 命令：
Linux
```
aws s3api put-object \
        --bucket amzn-s3-demo-bucket \
        --key metadata/
```
其中：
amzn-s3-demo-bucket 是您的存储桶名称。
macOS
```
aws s3api put-object \
        --bucket amzn-s3-demo-bucket \
        --key metadata/
```
其中：
amzn-s3-demo-bucket 是您的存储桶名称。
Windows
```
aws s3api put-object ^
        --bucket amzn-s3-demo-bucket ^
        --key metadata/
```
其中：
amzn-s3-demo-bucket 是您的存储桶名称。
为确保您的文件夹已成功创建，请使用 list 命令检查存储桶中的内容：
Linux
```
aws s3 ls s3://amzn-s3-demo-bucket/
```
其中：
amzn-s3-demo-bucket 是您的存储桶名称。
macOS
```
aws s3 ls s3://amzn-s3-demo-bucket/
```
其中：
amzn-s3-demo-bucket 是您的存储桶名称。
Windows
```
aws s3 ls s3://amzn-s3-demo-bucket/
```
其中：
amzn-s3-demo-bucket 是您的存储桶名称。

上传输入数据

创建数据和元数据文件夹后，将示例数据集上传到 data 文件夹中。

打开 Amazon S3 控制台，网址为 https://console.aws.amazon.com/s3/。
在存储桶中，从存储桶列表中单击存储桶的名称，然后单击 data。
选择上传，然后选择添加文件。
在对话框中，在本地设备中导航到 tutorial-dataset 文件夹内的文件夹 data，选择所有文件，然后选择打开。
保留目标、权限和属性的默认设置。
选择上传。

要将示例数据上传到 data 文件夹，请使用以下 AWS CLI中的 copy 命令：
Linux
```
aws s3 cp path/tutorial-dataset/data s3://amzn-s3-demo-bucket/data/ --recursive
```
其中：
path/是设备上tutorial-dataset文件夹的文件路径，

amzn-s3-demo-bucket 是您的存储桶名称。
macOS
```
aws s3 cp path/tutorial-dataset/data s3://amzn-s3-demo-bucket/data/ --recursive
```
其中：
path/是设备上tutorial-dataset文件夹的文件路径，

amzn-s3-demo-bucket 是您的存储桶名称。
Windows
```
aws s3 cp path/tutorial-dataset/data s3://amzn-s3-demo-bucket/data/ --recursive
```
其中：
path/是设备上tutorial-dataset文件夹的文件路径，

amzn-s3-demo-bucket 是您的存储桶名称。
要确保您的数据集文件已成功上传到 data 文件夹中，请使用 AWS CLI中的 list 命令：
Linux
```
aws s3 ls s3://amzn-s3-demo-bucket/data/
```
其中：
amzn-s3-demo-bucket 是您的 S3 存储桶的名称。
macOS
```
aws s3 ls s3://amzn-s3-demo-bucket/data/
```
其中：
amzn-s3-demo-bucket 是您的 S3 存储桶的名称。
Windows
```
aws s3 ls s3://amzn-s3-demo-bucket/data/
```
其中：
amzn-s3-demo-bucket 是您的 S3 存储桶的名称。

在此步骤结束时，您将有一个 S3 存储桶，其中的数据集存储在 data 文件夹中，还有一个用于存储您的 Amazon Kendra 元数据的空 metadata 文件夹。

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

文档惯例

教程：构建智能搜索解决方案

步骤 2：检测实体

步骤 1：向 Amazon S3 添加文档

主题

下载示例数据集

创建 Amazon S3 存储桶

重要

注意

注意

在 S3 存储桶中创建数据和元数据文件夹

上传输入数据