Using an Amazon S3 data source - Amazon Kendra

Using an Amazon S3 data source

Amazon S3 is an object storage service that stores data as objects within buckets. If you are a Amazon S3 user, you can use Amazon Kendra to index your Amazon S3 bucket repository of documents.

Warning

Amazon Kendra doesn't use a bucket policy that grants permissions to an Amazon Kendra principal to interact with an S3 bucket. Instead, it uses IAM roles. Make sure that Amazon Kendra isn't included as a trusted member in your bucket policy to avoid any data security issues in accidentally granting permissions to arbitrary principals. However, you can add a bucket policy to use an Amazon S3 bucket across different accounts. For more information, see Policies to use Amazon S3 across accounts. For information about IAM roles for S3 data sources, see IAM roles.

You can connect Amazon Kendra to your Amazon S3 data source using the Amazon Kendra console and the S3DataSourceConfiguration API.

For troubleshooting your Amazon Kendra S3 data source connector, see Troubleshooting data sources.

Supported features

  • Field mappings

  • User context filtering

  • Inclusion/exclusion filters

Prerequisites

Before you can use Amazon Kendra to index your Amazon S3 data source, you must meet the following requirements:

  • You have created an Amazon Kendra index. You must create an index before you create the data source. You need the index id to connect your data source. For more information on how to create an Amazon Kendra index, see Creating an index.

  • You have an IAM role for your data source. Amazon Kendra uses this role to access the AWS resources required to create the Amazon Kendra resource. You provide the Amazon Resource Name (ARN) of the IAM role with the policy attached when you connect your data source to Amazon Kendra. If you are using the API, you must create an IAM role before you connect your datasource. If you use the AWS console, you can choose to use an existing IAM role or create a new one when you configure your Amazon Kendra connector. For more information on using an IAM role for your S3 data source, see IAM roles for data sources.

  • Your bucket must be in the same region as your Amazon Kendra index and your index must have permission to access the bucket that contains your documents.

  • You have copied the name of your Amazon S3 bucket name. You need this information to connect Amazon Kendra to Amazon S3.

Connecting Amazon Kendra to your Amazon S3 data source

To connect Amazon Kendra to your Amazon S3 data source you must provide details of your Amazon S3 credentials so that Amazon Kendra can access your data. If you have not yet configured Amazon S3 for Amazon Kendra see Prerequisites.

Console

To connect Amazon Kendra to Amazon S3

  1. Sign in to the Amazon Kendra at AWS Console.

  2. From the left navigation pane, choose Indexes and then choose the index you want to connect from the list of indexes.

  3. On the Getting started page, choose Add data sources.

    Note

    You can choose to configure or edit your User access control settings under Index settings.

  4. On the Add data source page, choose S3, and then choose Add connector.

  5. On the Specify data source details page, enter the following information:

    1. Data source name—Enter a name for your data source. You can include hyphens but not spaces.

    2. (Optional) Description—Enter an optional description for your data source.

    3. Default language—A language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in metadata overrides selected language.

    4. Add new tag—Tags to search and filter your resources or track your AWS costs.

    5. Choose Next.

  6. On the Configure sync settings page, enter the following information:

    1. Enter the data source location—The path to the Amazon S3 bucket where your data is stored. Select Browse S3 to choose your bucket.

    2. (Optional) Metadata files prefix folder location—The path to the folder in which your metadata is stored. Select Browse S3 to locate your metadata folder.

    3. (Optional) Access control list configuration file location—The path to the location of a file containing a JSON structure that specifies access settings for the files stored in your S3 data source. Select Browse S3 to locate your ACL file.

    4. (Optional) Decription key—Select to use a decription key. You can choose to use an existing one or create a new one.

    5. (Optional) Additional configurations—Add patterns to include or exclude documents from your index.All paths are relative to the data source location S3 bucket. You can have a combined total of 100 patterns.

    6. Frequency—How often Amazon Kendra will sync with your data source.

    7. IAM role—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.

      Note

      IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose Create a new role to avoid errors.

    8. Choose Next.

  7. On the Review and create page, check that the information you have entered is correct and then select Add data source. Your data source will appear on the Data sources page once it is added successfully.

S3DataSourceConfiguration API

To connect Amazon Kendra to Amazon S3

You must specify the following using the S3DataSourceConfiguration API:

  • BucketName—The name of the bucket that contains the documents.

  • IAM role—You must provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the S3 connector and Amazon Kendra. For more information, see IAM roles for S3 data sources.

You can also add the following optional features:

  • Inclusion and exclusion filters—You can specify glob patterns to include or exclude certain files.

    Note

    If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.

  • Context filtering—You can choose to filter a user’s results based on their user or group access to documents. For more information, see User context filtering for S3 data sources.