CreateDataSource - Amazon Kendra API Reference

CreateDataSource

Creates a data source connector that you want to use with an Amazon Kendra index.

You specify a name, data source connector type and description for your data source. You also specify configuration information for the data source connector.

CreateDataSource is a synchronous operation. The operation returns 200 if the data source was successfully created. Otherwise, an exception is raised.

For an example of creating an index and data source using the Python SDK, see Getting started with Python SDK. For an example of creating an index and data source using the Java SDK, see Getting started with Java SDK.

Request Syntax

{ "ClientToken": "string", "Configuration": { "AlfrescoConfiguration": { "BlogFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "CrawlComments": boolean, "CrawlSystemFolders": boolean, "DocumentLibraryFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "EntityFilter": [ "string" ], "ExclusionPatterns": [ "string" ], "InclusionPatterns": [ "string" ], "SecretArn": "string", "SiteId": "string", "SiteUrl": "string", "SslCertificateS3Path": { "Bucket": "string", "Key": "string" }, "VpcConfiguration": { "SecurityGroupIds": [ "string" ], "SubnetIds": [ "string" ] }, "WikiFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ] }, "BoxConfiguration": { "CommentFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "CrawlComments": boolean, "CrawlTasks": boolean, "CrawlWebLinks": boolean, "EnterpriseId": "string", "ExclusionPatterns": [ "string" ], "FileFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "InclusionPatterns": [ "string" ], "SecretArn": "string", "TaskFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "UseChangeLog": boolean, "VpcConfiguration": { "SecurityGroupIds": [ "string" ], "SubnetIds": [ "string" ] }, "WebLinkFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ] }, "ConfluenceConfiguration": { "AttachmentConfiguration": { "AttachmentFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "CrawlAttachments": boolean }, "AuthenticationType": "string", "BlogConfiguration": { "BlogFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ] }, "ExclusionPatterns": [ "string" ], "InclusionPatterns": [ "string" ], "PageConfiguration": { "PageFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ] }, "ProxyConfiguration": { "Credentials": "string", "Host": "string", "Port": number }, "SecretArn": "string", "ServerUrl": "string", "SpaceConfiguration": { "CrawlArchivedSpaces": boolean, "CrawlPersonalSpaces": boolean, "ExcludeSpaces": [ "string" ], "IncludeSpaces": [ "string" ], "SpaceFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ] }, "Version": "string", "VpcConfiguration": { "SecurityGroupIds": [ "string" ], "SubnetIds": [ "string" ] } }, "DatabaseConfiguration": { "AclConfiguration": { "AllowedGroupsColumnName": "string" }, "ColumnConfiguration": { "ChangeDetectingColumns": [ "string" ], "DocumentDataColumnName": "string", "DocumentIdColumnName": "string", "DocumentTitleColumnName": "string", "FieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ] }, "ConnectionConfiguration": { "DatabaseHost": "string", "DatabaseName": "string", "DatabasePort": number, "SecretArn": "string", "TableName": "string" }, "DatabaseEngineType": "string", "SqlConfiguration": { "QueryIdentifiersEnclosingOption": "string" }, "VpcConfiguration": { "SecurityGroupIds": [ "string" ], "SubnetIds": [ "string" ] } }, "FsxConfiguration": { "ExclusionPatterns": [ "string" ], "FieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "FileSystemId": "string", "FileSystemType": "string", "InclusionPatterns": [ "string" ], "SecretArn": "string", "VpcConfiguration": { "SecurityGroupIds": [ "string" ], "SubnetIds": [ "string" ] } }, "GitHubConfiguration": { "ExclusionFileNamePatterns": [ "string" ], "ExclusionFileTypePatterns": [ "string" ], "ExclusionFolderNamePatterns": [ "string" ], "GitHubCommitConfigurationFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "GitHubDocumentCrawlProperties": { "CrawlIssue": boolean, "CrawlIssueComment": boolean, "CrawlIssueCommentAttachment": boolean, "CrawlPullRequest": boolean, "CrawlPullRequestComment": boolean, "CrawlPullRequestCommentAttachment": boolean, "CrawlRepositoryDocuments": boolean }, "GitHubIssueAttachmentConfigurationFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "GitHubIssueCommentConfigurationFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "GitHubIssueDocumentConfigurationFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "GitHubPullRequestCommentConfigurationFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "GitHubPullRequestDocumentAttachmentConfigurationFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "GitHubPullRequestDocumentConfigurationFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "GitHubRepositoryConfigurationFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "InclusionFileNamePatterns": [ "string" ], "InclusionFileTypePatterns": [ "string" ], "InclusionFolderNamePatterns": [ "string" ], "OnPremiseConfiguration": { "HostUrl": "string", "OrganizationName": "string", "SslCertificateS3Path": { "Bucket": "string", "Key": "string" } }, "RepositoryFilter": [ "string" ], "SaaSConfiguration": { "HostUrl": "string", "OrganizationName": "string" }, "SecretArn": "string", "Type": "string", "UseChangeLog": boolean, "VpcConfiguration": { "SecurityGroupIds": [ "string" ], "SubnetIds": [ "string" ] } }, "GoogleDriveConfiguration": { "ExcludeMimeTypes": [ "string" ], "ExcludeSharedDrives": [ "string" ], "ExcludeUserAccounts": [ "string" ], "ExclusionPatterns": [ "string" ], "FieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "InclusionPatterns": [ "string" ], "SecretArn": "string" }, "JiraConfiguration": { "AttachmentFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "CommentFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "ExclusionPatterns": [ "string" ], "InclusionPatterns": [ "string" ], "IssueFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "IssueSubEntityFilter": [ "string" ], "IssueType": [ "string" ], "JiraAccountUrl": "string", "Project": [ "string" ], "ProjectFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "SecretArn": "string", "Status": [ "string" ], "UseChangeLog": boolean, "VpcConfiguration": { "SecurityGroupIds": [ "string" ], "SubnetIds": [ "string" ] }, "WorkLogFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ] }, "OneDriveConfiguration": { "DisableLocalGroups": boolean, "ExclusionPatterns": [ "string" ], "FieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "InclusionPatterns": [ "string" ], "OneDriveUsers": { "OneDriveUserList": [ "string" ], "OneDriveUserS3Path": { "Bucket": "string", "Key": "string" } }, "SecretArn": "string", "TenantDomain": "string" }, "QuipConfiguration": { "AttachmentFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "CrawlAttachments": boolean, "CrawlChatRooms": boolean, "CrawlFileComments": boolean, "Domain": "string", "ExclusionPatterns": [ "string" ], "FolderIds": [ "string" ], "InclusionPatterns": [ "string" ], "MessageFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "SecretArn": "string", "ThreadFieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "VpcConfiguration": { "SecurityGroupIds": [ "string" ], "SubnetIds": [ "string" ] } }, "S3Configuration": { "AccessControlListConfiguration": { "KeyPath": "string" }, "BucketName": "string", "DocumentsMetadataConfiguration": { "S3Prefix": "string" }, "ExclusionPatterns": [ "string" ], "InclusionPatterns": [ "string" ], "InclusionPrefixes": [ "string" ] }, "SalesforceConfiguration": { "ChatterFeedConfiguration": { "DocumentDataFieldName": "string", "DocumentTitleFieldName": "string", "FieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "IncludeFilterTypes": [ "string" ] }, "CrawlAttachments": boolean, "ExcludeAttachmentFilePatterns": [ "string" ], "IncludeAttachmentFilePatterns": [ "string" ], "KnowledgeArticleConfiguration": { "CustomKnowledgeArticleTypeConfigurations": [ { "DocumentDataFieldName": "string", "DocumentTitleFieldName": "string", "FieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "Name": "string" } ], "IncludedStates": [ "string" ], "StandardKnowledgeArticleTypeConfiguration": { "DocumentDataFieldName": "string", "DocumentTitleFieldName": "string", "FieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ] } }, "SecretArn": "string", "ServerUrl": "string", "StandardObjectAttachmentConfiguration": { "DocumentTitleFieldName": "string", "FieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ] }, "StandardObjectConfigurations": [ { "DocumentDataFieldName": "string", "DocumentTitleFieldName": "string", "FieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "Name": "string" } ] }, "ServiceNowConfiguration": { "AuthenticationType": "string", "HostUrl": "string", "KnowledgeArticleConfiguration": { "CrawlAttachments": boolean, "DocumentDataFieldName": "string", "DocumentTitleFieldName": "string", "ExcludeAttachmentFilePatterns": [ "string" ], "FieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "FilterQuery": "string", "IncludeAttachmentFilePatterns": [ "string" ] }, "SecretArn": "string", "ServiceCatalogConfiguration": { "CrawlAttachments": boolean, "DocumentDataFieldName": "string", "DocumentTitleFieldName": "string", "ExcludeAttachmentFilePatterns": [ "string" ], "FieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "IncludeAttachmentFilePatterns": [ "string" ] }, "ServiceNowBuildVersion": "string" }, "SharePointConfiguration": { "AuthenticationType": "string", "CrawlAttachments": boolean, "DisableLocalGroups": boolean, "DocumentTitleFieldName": "string", "ExclusionPatterns": [ "string" ], "FieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "InclusionPatterns": [ "string" ], "ProxyConfiguration": { "Credentials": "string", "Host": "string", "Port": number }, "SecretArn": "string", "SharePointVersion": "string", "SslCertificateS3Path": { "Bucket": "string", "Key": "string" }, "Urls": [ "string" ], "UseChangeLog": boolean, "VpcConfiguration": { "SecurityGroupIds": [ "string" ], "SubnetIds": [ "string" ] } }, "SlackConfiguration": { "CrawlBotMessage": boolean, "ExcludeArchived": boolean, "ExclusionPatterns": [ "string" ], "FieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "InclusionPatterns": [ "string" ], "LookBackPeriod": number, "PrivateChannelFilter": [ "string" ], "PublicChannelFilter": [ "string" ], "SecretArn": "string", "SinceCrawlDate": "string", "SlackEntityList": [ "string" ], "TeamId": "string", "UseChangeLog": boolean, "VpcConfiguration": { "SecurityGroupIds": [ "string" ], "SubnetIds": [ "string" ] } }, "TemplateConfiguration": { "Template": JSON value }, "WebCrawlerConfiguration": { "AuthenticationConfiguration": { "BasicAuthentication": [ { "Credentials": "string", "Host": "string", "Port": number } ] }, "CrawlDepth": number, "MaxContentSizePerPageInMegaBytes": number, "MaxLinksPerPage": number, "MaxUrlsPerMinuteCrawlRate": number, "ProxyConfiguration": { "Credentials": "string", "Host": "string", "Port": number }, "UrlExclusionPatterns": [ "string" ], "UrlInclusionPatterns": [ "string" ], "Urls": { "SeedUrlConfiguration": { "SeedUrls": [ "string" ], "WebCrawlerMode": "string" }, "SiteMapsConfiguration": { "SiteMaps": [ "string" ] } } }, "WorkDocsConfiguration": { "CrawlComments": boolean, "ExclusionPatterns": [ "string" ], "FieldMappings": [ { "DataSourceFieldName": "string", "DateFieldFormat": "string", "IndexFieldName": "string" } ], "InclusionPatterns": [ "string" ], "OrganizationId": "string", "UseChangeLog": boolean } }, "CustomDocumentEnrichmentConfiguration": { "InlineConfigurations": [ { "Condition": { "ConditionDocumentAttributeKey": "string", "ConditionOnValue": { "DateValue": number, "LongValue": number, "StringListValue": [ "string" ], "StringValue": "string" }, "Operator": "string" }, "DocumentContentDeletion": boolean, "Target": { "TargetDocumentAttributeKey": "string", "TargetDocumentAttributeValue": { "DateValue": number, "LongValue": number, "StringListValue": [ "string" ], "StringValue": "string" }, "TargetDocumentAttributeValueDeletion": boolean } } ], "PostExtractionHookConfiguration": { "InvocationCondition": { "ConditionDocumentAttributeKey": "string", "ConditionOnValue": { "DateValue": number, "LongValue": number, "StringListValue": [ "string" ], "StringValue": "string" }, "Operator": "string" }, "LambdaArn": "string", "S3Bucket": "string" }, "PreExtractionHookConfiguration": { "InvocationCondition": { "ConditionDocumentAttributeKey": "string", "ConditionOnValue": { "DateValue": number, "LongValue": number, "StringListValue": [ "string" ], "StringValue": "string" }, "Operator": "string" }, "LambdaArn": "string", "S3Bucket": "string" }, "RoleArn": "string" }, "Description": "string", "IndexId": "string", "LanguageCode": "string", "Name": "string", "RoleArn": "string", "Schedule": "string", "Tags": [ { "Key": "string", "Value": "string" } ], "Type": "string", "VpcConfiguration": { "SecurityGroupIds": [ "string" ], "SubnetIds": [ "string" ] } }

Request Parameters

For information about the parameters that are common to all actions, see Common Parameters.

The request accepts the following data in JSON format.

ClientToken

A token that you provide to identify the request to create a data source connector. Multiple calls to the CreateDataSource API with the same client token will create only one data source connector.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 100.

Required: No

Configuration

Configuration information to connect to your data source repository.

You can't specify the Configuration parameter when the Type parameter is set to CUSTOM. If you do, you receive a ValidationException exception.

The Configuration parameter is required for all other data sources.

Type: DataSourceConfiguration object

Required: No

CustomDocumentEnrichmentConfiguration

Configuration information for altering document metadata and content during the document ingestion process.

For more information on how to create, modify and delete document metadata, or make other content alterations when you ingest documents into Amazon Kendra, see Customizing document metadata during the ingestion process.

Type: CustomDocumentEnrichmentConfiguration object

Required: No

Description

A description for the data source connector.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 1000.

Pattern: ^\P{C}*$

Required: No

IndexId

The identifier of the index you want to use with the data source connector.

Type: String

Length Constraints: Fixed length of 36.

Pattern: [a-zA-Z0-9][a-zA-Z0-9-]*

Required: Yes

LanguageCode

The code for a language. This allows you to support a language for all documents when creating the data source connector. English is supported by default. For more information on supported languages, including their codes, see Adding documents in languages other than English.

Type: String

Length Constraints: Minimum length of 2. Maximum length of 10.

Pattern: [a-zA-Z-]*

Required: No

Name

A name for the data source connector.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 1000.

Pattern: [a-zA-Z0-9][a-zA-Z0-9_-]*

Required: Yes

RoleArn

The Amazon Resource Name (ARN) of an IAM role with permission to access the data source and required resources. For more information, see IAM access roles for Amazon Kendra..

You can't specify the RoleArn parameter when the Type parameter is set to CUSTOM. If you do, you receive a ValidationException exception.

The RoleArn parameter is required for all other data sources.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 1284.

Pattern: arn:[a-z0-9-\.]{1,63}:[a-z0-9-\.]{0,63}:[a-z0-9-\.]{0,63}:[a-z0-9-\.]{0,63}:[^/].{0,1023}

Required: No

Schedule

Sets the frequency for Amazon Kendra to check the documents in your data source repository and update the index. If you don't set a schedule Amazon Kendra will not periodically update the index. You can call the StartDataSourceSyncJob API to update the index.

Specify a cron- format schedule string or an empty string to indicate that the index is updated on demand.

You can't specify the Schedule parameter when the Type parameter is set to CUSTOM. If you do, you receive a ValidationException exception.

Type: String

Required: No

Tags

A list of key-value pairs that identify or categorize the data source connector. You can also use tags to help control access to the data source connector. Tag keys and values can consist of Unicode letters, digits, white space, and any of the following symbols: _ . : / = + - @.

Type: Array of Tag objects

Array Members: Minimum number of 0 items. Maximum number of 200 items.

Required: No

Type

The type of data source repository. For example, SHAREPOINT.

Type: String

Valid Values: S3 | SHAREPOINT | DATABASE | SALESFORCE | ONEDRIVE | SERVICENOW | CUSTOM | CONFLUENCE | GOOGLEDRIVE | WEBCRAWLER | WORKDOCS | FSX | SLACK | BOX | QUIP | JIRA | GITHUB | ALFRESCO | TEMPLATE

Required: Yes

VpcConfiguration

Configuration information for an Amazon Virtual Private Cloud to connect to your data source. For more information, see Configuring a VPC.

Type: DataSourceVpcConfiguration object

Required: No

Response Syntax

{ "Id": "string" }

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

Id

The identifier of the data source connector.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 100.

Pattern: [a-zA-Z0-9][a-zA-Z0-9_-]*

Errors

For information about the errors that are common to all actions, see Common Errors.

AccessDeniedException

You don't have sufficient access to perform this action. Please ensure you have the required permission policies and user accounts and try again.

HTTP Status Code: 400

ConflictException

A conflict occurred with the request. Please fix any inconsistences with your resources and try again.

HTTP Status Code: 400

InternalServerException

An issue occurred with the internal server used for your Amazon Kendra service. Please wait a few minutes and try again, or contact Support for help.

HTTP Status Code: 500

ResourceAlreadyExistException

The resource you want to use already exists. Please check you have provided the correct resource and try again.

HTTP Status Code: 400

ResourceNotFoundException

The resource you want to use doesn’t exist. Please check you have provided the correct resource and try again.

HTTP Status Code: 400

ServiceQuotaExceededException

You have exceeded the set limits for your Amazon Kendra service. Please see Quotas for more information, or contact Support to inquire about an increase of limits.

HTTP Status Code: 400

ThrottlingException

The request was denied due to request throttling. Please reduce the number of requests and try again.

HTTP Status Code: 400

ValidationException

The input fails to satisfy the constraints set by the Amazon Kendra service. Please provide the correct input and try again.

HTTP Status Code: 400

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: