Using an S3 data source
Amazon Kendra doesn't use a bucket policy that grants permissions to an Amazon Kendra principal to interact with an S3 bucket. Instead, it uses IAM roles. Make sure that Amazon Kendra isn't included as a trusted member in your bucket policy to avoid any data security issues in accidentally granting permissions to arbitrary principals. However, you can add a bucket policy to use an Amazon S3 bucket across different accounts. For more information, see Policies to use Amazon S3 across accounts. For information about IAM roles for S3 data sources, see IAM roles.
You can use your S3 bucket repository of documents as a data source for Amazon Kendra. For a walk-through of how to use Amazon S3 in the console, see Getting started with an Amazon S3 data source (console).
When you connect to Amazon S3 to index your documents, you specify the name of the S3 bucket that contains your documents. You can specify glob patterns to include or exclude specific documents in your name of provider.
You must create an index before you create the Amazon S3 data source. For more information, see CreateDataSource. You provide the ID of the index when you create the data source.
To connect to Amazon S3, you specify the connection and other information in the console or by using the S3DataSourceConfiguration object. You provide the name of the Amazon S3 bucket you want to index.
Before you can index your documents from your Amazon S3 bucket, your bucket must be in the same Region as the index and Amazon Kendra must have permission to access the bucket that contains your documents. You can configure your Access Control List for your Amazon S3 bucket. This contains information on user and group access to documents.
You also must provide the Amazon Resource Name (ARN) of an IAM role that gives permission to access your Amazon S3 bucket. You provide the ARN of an IAM role using CreateDataSource. For more information on permissions, see IAM roles for Amazon S3 data sources.
You also can add the following optional information:
-
Inclusion or exclusion pattern: If you specify an inclusion pattern, any document with a file name or file type that does't match the pattern is not indexed. If you specify an inclusion and exclusion pattern, documents that match the exclusion pattern are not indexed even if they match the inclusion pattern.
The following examples demonstrate creating an Amazon S3 data source. The examples assume that you have already created an index and an IAM role with permission to read the data from the index. For more information about the IAM role, see IAM roles for Amazon S3 data sources. For more information about creating an index, see Creating an index.
It can take some time to create your data source. You can monitor the
progress by using the DescribeDataSource API. When the data source
status is ACTIVE
the data source is ready to use.
The following examples demonstrate getting the status of a data source.
This data source doesn't have a schedule, so it doesn't run automatically. To index the data source, you call StartDataSourceSyncJob to synchronize the index with the data source.
The following examples demonstrate synchronizing a data source.