Using an Atlassian Confluence data source - Amazon Kendra

Using an Atlassian Confluence data source

You can use your Atlassian Confluence as a data source for Amazon Kendra. To use Confluence in the console, go to the Amazon Kendra console, select your index and then select Data sources from the navigation menu to add Confluence.

For troubleshooting your Amazon Kendra Confluence data source connector, see Troubleshooting data sources.

Amazon Kendra supports Atlassian Confluence Cloud and Atlassian Confluence Server.

You must create an index before you create the Confluence data source. For more information, see Creating an index. You provide the ID of the index when you create the data source.

Before you can index your content from your Confluence, you must create an account with administrative permissions. The account must grant Amazon Kendra permission to view all of the content within your Confluence instance. You can grant the account these permissions by making it a member of the confluence-administrators group. If you use Single Sign-On (SSO) with Confluence, you must enable Show on login page for the user name and password when you configure Confluence Authentication methods in Confluence Data Center.

When you connect to Confluence to index your documents, you specify the URL of your Confluence instance. You can specify regular expression patterns to include or exclude specific blog posts, pages, spaces, or attachments in your Confluence. Amazon Kendra indexes blogs, pages, and regular spaces by default. If you choose to index attachments, only attachments to the indexed pages and blogs are indexed.

To connect to Confluence, you specify the connection and other information in the console or by using the ConfluenceConfiguration object. You provide the URL of your Confluence instance you want to index.

You must specify the version of Confluence you use when configuring Confluence, whether you use Confluence Cloud or Confluence Server.

You also must provide the Amazon Resource Name (ARN) of an IAM role that gives permission to access your AWS Secrets Manager secret, which stores your Confluence authentication credentials, and the AWS Key Management Service key used to decrypt it. You provide the ARN of an IAM role using the CreateDataSource API. For more information on permissions, see IAM roles for Atlassian Confluence data sources.

Amazon Kendra requires authentication credentials to access your Confluence instance. See Authentication

Amazon Kendra also crawls user information from the Confluence instance. This is useful for user context filtering, where search results are filtered based on the user or their group access to documents. For more information, see User context filtering for Confluence data sources.

You also can add the following optional information:

  • Whether to connect to your Confluence URL instance via a web proxy. You can use this option for Confluence Server.

  • Inclusion or exclusion patterns: If you specify an inclusion pattern, only content that matches the inclusion pattern is indexed. Any document with a file name or file type that doesn't match the pattern isn't indexed. If you specify an inclusion and exclusion pattern, documents that match the exclusion pattern are not indexed even if they match the inclusion pattern.

  • Page field mappings that map your Confluence fields to Amazon Kendra index fields. For more information, see Mapping data source fields.

Indexing spaces

Amazon Kendra includes information from a space in the index. A space may be included in the results of a query based on this information. The Confluence account used for the data source must have permission to access the space in order to index it.

By default, Amazon Kendra doesn't index Confluence archive and personal spaces. You can choose to index them when you create the data source. If you don't want Amazon Kendra to index a space, mark it private in Confluence.

You can restrict access to the contents of a space by specifying view permissions. If a query includes user information, Amazon Kendra reads these permissions and uses them for user context filtering. For more information, see Filtering on user context.

If you use the Amazon Kendra console to create a Confluence data source, Amazon Kendra creates index fields for you when you specify a field mapping. If you use the API, you must first create the index field using the UpdateIndex API. To map the Confluence fields to Amazon Kendra index fields, see the following table.

Confluence field Suggested Amazon Kendra field
DISPLAY_URL _source_uri
ITEM_TYPE _category
SPACE_KEY cf_space_key
URL cf_url

Indexing pages

Amazon Kendra indexes all pages, including nested pages, in a space unless they are filtered out by an inclusion or exclusion pattern.

To index pages, you must use a Confluence account that has access to the pages. Access to pages in Confluence can be through nested group permissions. To access a page, you must belong to the group or sub group that has permission to access the page. If a query includes user information, Amazon Kendra reads these permissions and uses them for user context filtering. For more information, see Filtering on user context.

If you use the console to create a Confluence data source, Amazon Kendra creates the index fields for you when you specify a field mapping. If you use the API, you must first create the index field using the UpdateIndex API. To map the Confluence fields to Amazon Kendra index fields, see the following table.

Confluence field Suggested Amazon Kendra field
AUTHOR cf_author
CONTENT_STATUS cf_page_content_status
CREATED_DATE _created_at
DISPLAY_URL _source_uri
ITEM_TYPE _category
LABELS cf_labels
MODIFIED_DATE _last_updated_at
PARENT_ID cf_parent_id
SPACE_KEY cf_space_key
SPACE_NAME cf_space_name
URL cf_url
VERSION cf_version

Blogs

Amazon Kendra indexes all blogs in a space unless they are filtered from indexing by an inclusion or an exclusion pattern.

To index blogs, you must use a Confluence account that has access to the blogs and the spaces that contain the blogs. Access to blogs in Confluence can be through nested group permissions. To access a blog, you must belong to the group or sub group that has permission to access the blog and its space. If a query includes user information, Amazon Kendra reads these permissions and uses them for user context filtering. For more information, see Filtering on user context.

If you use the console to index a Confluence data source, Amazon Kendra creates the index fields for you when you specify a field mapping. If you use the API, you must first create the index field using the UpdateIndex API. To map the Confluence data source fields to Amazon Kendra index fields, see the following table.

Confluence field Suggested Amazon Kendra field
AUTHOR cf_author
DISPLAY_URL _source_uri
ITEM_TYPE _category
LABELS cf_labels
PUBLISH_DATE _created_at
SPACE_KEY cf_space_key
SPACE_NAME cf_space_name
URL cf_url
VERSION cf_version

Attachments

Confluence enables you to create attachments to pages and blog posts. By default, attachments aren't indexed. You can configure Amazon Kendra to include attachments in the index. Amazon Kendra includes only attachments to indexed pages and blogs in the index.

Amazon Kendra indexes only the following supported documents types:

  • Microsoft Word

  • Microsoft PowerPoint

  • HTML

  • PDF

  • Plain text

To index attachments, you must use a Confluence account that has access to the blogs or pages of the attachments and their spaces. Access to blogs in Confluence can be through nested group permissions. You must belong to the group or sub group that has permission to access the blogs or pages of the attachments and their spaces. If a query includes user information, Amazon Kendra reads these permissions and uses them for user context filtering. For more information, see Filtering on user context.

If you use the console, Amazon Kendra creates index fields for you when you specify a field mapping. If you use the API, you must first create the index field using the UpdateIndex API. To map the Confluence fields to Amazon Kendra fields, see the following table.

Confluence field Suggested Amazon Kendra field
AUTHOR cf_author
CONTENT_TYPE cf_attachment_content_type
CREATED_DATE _created_at
DISPLAY_URL _source_uri
FILE_SIZE cf_attachment_file_size
ITEM_TYPE _category
LABELS cf_labels
PARENT_ID cf_parent_id
SPACE_KEY cf_space_key
SPACE_NAME cf_space_name
URL cf_url
VERSION cf_version

Authentication

There are two types of authentication that you can use with Atlassian Confluence. The first, basic authentication, permits Amazon Kendra to connect to the Confluence instance using a user name and password.

The second, personal access token, can be used in replace of a user name and password. You can use a personal access token for Confluence Server.

You must be user with administrative permissions to the Confluence instance, whether you use basic authentication or personal access token.

It is recommended that you regularly refresh or rotate your credentials and secret, and only provide the necessary level of access for your own security.

Basic authentication

When you use basic authentication, you provide the user name and password of an administrative user of your Confluence instance. Amazon Kendra uses these credentials to connect to Confluence.

You store your user name and password in an AWS Secrets Manager secret. If you are using the Amazon Kendra console to create your data source, you can create the secret while creating the data source. Or you can use an existing Secrets Manager secret. If you are using the API to create your data source, you must provide the Amazon Resource Name (ARN) of an existing secret.

The basic credentials are stored as a JSON string in the Secrets Manager secret.

{ "username": "user name", "password": "password" }

Personal access token authentication

When you use personal access token authentication to connect to Confluence Server, you provide the token that replaces a user name and password.

You store your personal access token in an AWS Secrets Manager secret. You create the token in Confluence. If you are using the Amazon Kendra console to create your data source, you can create the secret while creating the data source. Or you can use an existing Secrets Manager secret. If you are using the API to create your data source, you must provide the Amazon Resource Name (ARN) of an existing secret.

The personal access token credentials are stored as a JSON string in the Secrets Manager secret.

{ "patToken": "personal access token" }

Creating the personal access token

To create a personal access token in Confluence

  1. Log in to the Azure desktop application. You must be a user with administrative permissions.

  2. Select Confluence.

  3. Select your profile picture dropdown at the top of the page, then select Personal Access Tokens.

  4. Select Create token.

  5. Enter a name for your token. For example, kendra_confluence_token.

  6. Set the expiration date of your token. It is recommended that you re-create a personal access token on a regular basis for your own security. To re-sync your data source in future, you might need a new personal access token if it has expired and you'll need to update your secret.

  7. Select Create.

  8. Copy the token. You'll need this when you create the Secrets Manager secret for the Confluence data source.