Using a SharePoint data source - Amazon Kendra

Using a SharePoint data source

SharePoint is a collaborative website building service that you can use to customize web content and create pages, sites, document libraries, and lists. If you are a SharePoint user, you can use Amazon Kendra to index your SharePoint data source.

Amazon Kendra currently supports SharePoint Online and SharePoint Server (versions 2013, 2016, and 2019).

You can connect Amazon Kendra to your SharePoint data source using the Amazon Kendra console and the SharePointConfiguration API.

For troubleshooting your Amazon Kendra SharePoint data source connector, see Troubleshooting data sources.

Supported features

  • Change log

  • Field mappings

  • User context filtering

  • Inclusion/exclusion filters

  • Virtual private cloud (VPC)

Prerequisites

Before you can use Amazon Kendra to index your SharePoint data source, you must meet the following requirements:

  • You have created an Amazon Kendra index. You must create an index before you create the data source. You need the index id to connect your data source. For more information on how to create an Amazon Kendra index, see Creating an index.

  • You have an IAM role for your data source. Amazon Kendra uses this role to access the AWS resources required to create the Amazon Kendra resource. You provide the Amazon Resource Name (ARN) of the IAM role with the policy attached when you connect your data source to Amazon Kendra. If you are using the API, you must create an IAM role before you connect your datasource. If you use the AWS console, you can choose to use an existing IAM role or create a new one when you configure your Amazon Kendra connector. For more information on using an IAM role for your SharePoint data source, see IAM roles for data sources.

  • You are a SharePoint user with administrative permissions.

  • You have copied the URL of the SharePoint sites you want to index. You need this information to connect SharePoint to Amazon Kendra.

  • In your SharePoint account, you have:

    • Generated a client ID and client secret in your SharePoint account.

    • Created SharePoint basic authentication credentials that connect to Amazon Kendra using a user name and password.

      You can use basic authentication to connect to SharePoint Online to Amazon Kendra. If you are using SharePoint Server you must provide your server domain name and your user name and password. The server domain name is the NetBIOS name in your Active Directory provider.

      Note

      If you use SharePoint Server and need to convert your Access Control List (ACL) to email format for filtering on user context, provide the LDAP server URL and LDAP search base. Or you can use the directory domain override. The LDAP server URL is the full domain name and the port number (for example, ldap://example.com:389). The LDAP search base are the domain controllers 'example' and 'com'. With the directory domain override, you can use the email domain instead of using LDAP server URL and LDAP search base. For example, the email domain for username@example.com is 'example.com'. You can use this override if you aren't concerned about validating your domain and simply want to use your email domain.

    • Created OAuth2 authentication credentials that identifies Amazon Kendra and a user name, password, client ID and client secret.

    • Added the following permissions to your SharePoint account:

      For SharePoint lists:

      • Open Items—View the source of documents with server-side file handlers.

      • View Application Pages—View forms, views, and application pages. Enumerate lists.

      • View Items—View items in lists and documents in document libraries.

      • View Versions—View past versions of a list item or document.

      For SharePoint websites

      • Browse Directories—Enumerate files and folders in a website using SharePoint Designer and Web DAV interface.

      • Browse User Information—View information about users of the website.

      • Enumerate Permissions—Enumerate permissions on the website, list, folder, document, or list item.

      • Open—Open a website, list, or folder to access items inside the container.

      • Use Client Integration Features—Use SOAP, WebDAV, the client object model, or SharePoint Designer interfaces to access the website.

      • Use Remote Interfaces—Use features that launch client applications.

      • View Pages—View pages on a website.

      You can find more information on how to configure your SharePoint account on the Sharepoint Developer Documentation page.

  • You have an AWS Secrets Manager secret containing the authentication credentials you are using to connect your SharePoint data source with your Amazon Kendra index. If you are using the console to create your data source, you can create the secret there, or you can use an existing Secrets Manager secret. If you are using the API, you must provide the Amazon Resource Name (ARN) of an existing secret. It is recommended that you regularly refresh or rotate your credentials and secret, and only provide the necessary level of access for your own security.

  • (Optional) If you want to map attributes or custom index fields from your SharePoint data source to your Amazon Kendra index, you must make sure that these attributes and custom fields already exist in your data source file system custom metadata.

Connecting Amazon Kendra to your SharePoint data source

To connect Amazon Kendra to your SharePoint data source you must provide details of your SharePoint credentials so that Amazon Kendra can access your data. If you have not yet configured SharePoint for Amazon Kendra see Prerequisites.

Console

To connect Amazon Kendra to SharePoint

  1. Sign in to the Amazon Kendra at AWS Console.

  2. From the left navigation pane, choose Indexes and then choose the index you want to connect from the list of indexes.

  3. On the Getting started page, choose Add data sources.

    Note

    You can choose to configure or edit your User access control settings under Index settings.

  4. On the Add data source page, choose SharePoint, and then choose Add connector.

  5. On the Specify data source details page, enter the following information:

    1. Data source name—Enter a name for your data source. You can include hyphens but not spaces.

    2. (Optional) Description—Enter an optional description for your data source.

    3. Default language—A language to filter your documents for the index. Unless you specify otherwise, the language defaults to English. Language specified in metadata overrides selected language.

    4. Add new tag—Tags to search and filter your resources or track your AWS costs.

    5. Choose Next.

  6. On the Define access and security page, enter the following information:

    1. For Hosting method—Choose between SharePoint Online and SharePoint Server.

    2. If choosing SharePoint Server you will also have to choose SharePoint version, Site URLs specific to your SharePoint repository, and SSL certificate location.

    3. For Web proxy—Enter the Host name and Port number of your internal SharePoint instance.

    4. For Authentication—Choose between None, and LDAP, and Manual based on your use case.

    5. AWS Secrets Manager secret—Choose an existing secret or create a new Secrets Manager secret to store your SharePoint authentication credentials. If you choose to create a new secret an AWS Secrets Manager secret window opens.

      1. Enter following information in the Create an AWS Secrets Manager secret window:

        1. Secret name—A name for your secret. The prefix ‘AmazonKendra-SharePoint-’ is automatically added to your secret name.

        2. For User name, Password, and Email Domain Override—Enter the authentication credential values you generated and downloaded from your SharePoint account.

        3. Choose Save.

    6. Virtual Private Cloud (VPC)— You must also add Subnets and VPC security groups.

      Note

      You must use a VPC if you use SharePoint Server. Amazon VPC is optional for other SharePoint versions.

    7. IAM role—Choose an existing IAM role or create a new IAM role to access your repository credentials and index content.

      Note

      IAM roles used for indexes cannot be used for data sources. If you are unsure if an existing role is used for an index or FAQ, choose Create a new role to avoid errors.

    8. Choose Next.

  7. On the Configure sync settings page, enter the following information:

    1. Use Change log—Select to update your index instead of syncing all your files.

    2. Crawl attachments—Select to crawl attachments.

    3. Use local group mappings—Select to make sure that documents are properly filtered.

    4. Additional configuration—Add regular expression patterns to include or exclude certain files. You can add up to 100 patterns.

    5. Frequency—How often Amazon Kendra will sync with your data source.

    6. Choose Next.

  8. On the Set field mappings page, enter the following information:

    1. Amazon Kendra default field mappings—Select from the Amazon Kendra generated default data source fields you want to map to your index.

    2. For Custom field mappings—Add custom data source fields to create an index field name to map to and the field data type.

    3. Choose Next.

  9. On the Review and create page, check that the information you have entered is correct and then select Add data source. Your data source will appear on the Data sources page once it is added successfully.

SharePointConfiguration API

To connect Amazon Kendra to SharePoint

You must specify the following using SharePointConfiguration API:

  • SharePoint Version—You must specify the SharePoint version you use when configuring SharePoint. This is the case no matter if you use SharePoint Server 2013, SharePoint Server 2016, SharePoint Server 2019, or SharePoint Online.

  • Secret Amazon Resource Name (ARN)—You must provide the Amazon Resource Name (ARN) of a Secrets Manager secret that contains the authentication credentials you created in your SharePoint account. You provide the ARN using the CreateDataSource API.The secret is stored in a JSON structure.

    If you use SharePoint Online, the following is the minimum JSON structure that must be in your secret:

    { "username": "user name", "password": "password" }

    If you use SharePoint Server, the following is the minimum JSON structure that must be in your secret:

    { "username": "user name", "password": "password", "domain": "server domain name" }

    If you use SharePoint Server and need to convert your Access Control List (ACL) to email format for filtering on user context you can include the LDAP server URL and LDAP search base in your secret for SharePoint Server using the following JSON structure:

    { "username": "user name", "password": "password", "domain": "server domain name" "ldapServerUrl": "ldap://example.com:389", "ldapSearchBase": "dc=example,dc=com" "directoryDomainOverride": "example.com" }

    If you use OAuth authentication to connect to SharePoint you provide the client ID and secret that identifies Amazon Kendra to SharePoint Online. You also provide a user name and password that is used to access your SharePoint instance. You use the following JSON structure:

    { "username": "user name", "password": "password", "clientId": "client id" "clientSecret": "client secret" }
    Note

    It is recommended that you regularly refresh or rotate your credentials and secret, and only provide the necessary level of access for your own security. For more information on permissions, see IAM roles for SharePoint data sources.

  • IAM role—You must provide an IAM role with permissions to access your Secrets Manager secret and to call the required public APIs for the SharePoint connector and Amazon Kendra. For more information, see IAM roles for SharePoint data sources.

  • Amazon VPC—If you use SharePoint Server you must use a you must choose to specify VpcConfiguration when you call CreateDataSource. See Configuring Amazon Kendra to use a VPC.

You can also add the following optional features:

  • Web proxy—Whether to connect to your SharePoint site URLs via a web proxy. You can use this option for SharePoint Server.

  • Indexing lists—Whether Amazon Kendra should index the contents of attachments to SharePoint list items.

  • Change log—Whether Amazon Kendra should use the SharePoint data source change log mechanism to determine if a document must be added, updated, or deleted in the index.

    Note

    Use the change log if you don’t want Amazon Kendra to scan all of the documents. If your change log is large, it might take Amazon Kendra less time to scan the documents in the SharePoint data source than to process the change log. If you are syncing your SharePoint data source with your index for the first time, all documents are scanned.

  • Inclusion and exclusion filters—You can specify whether to include or exclude content. You can also specify regular expression patterns to include or exclude.

    Note

    If you specify an inclusion filter, only content that matches the inclusion filter is indexed. Any document that doesn’t match the inclusion filter isn’t indexed. If you specify an inclusion and exclusion filter, documents that match the exclusion filter are not indexed, even if they match the inclusion filter.

  • Context filtering—You can choose to filter a user’s results based on their user or group access to documents. For more information, see User context filtering for SharePoint data sources.

  • Field mappings—You can choose to map your SharePoint data source fields to your Amazon Kendra index fields. For more information, see Mapping data source fields.

Learn more

To learn more about integrating Amazon Kendra with your SharePoint data source, see: