Data source template schemas - Amazon Kendra

Data source template schemas

The following are template schemas for data sources where templates are supported.

Adobe Experience Manager template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the Adobe Experience Manager host URL, the authentication type, and whether you use Adobe Experience Manager (AEM) as a Cloud Service or AEM On-Premise as part of the connection configuration or repository endpoint details. Also, specify the type of data source as AEM, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. For more information, see Adobe Experience Manager JSON schema.

The following provides information on important JSON keys to configure.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
aemUrl The Adobe Experience Manager host URL. For example, if you use AEM On-Premise, you include the hostname and port: https://hostname:port. Or, if you use AEM as a Cloud Service, you can use the author URL: https://author-xxxxxx-xxxxxxx.adobeaemcloud.com.
authType The type of authentication you use, whether Basic or OAuth2.
deploymentType The type of Adobe Experience Manager that you use, either CLOUD or ON_PREMISE.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • page

  • asset

A list of objects that map the attributes or field names of your Adobe Experience Manager pages and assets to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source.
timeZoneId

If you use AEM On-Premise and the time zone of your server is different than the time zone of the Amazon Kendra AEM connector or index, you can specify the server time zone to align with the AEM connector or index.

The default time zone for AEM On-Premise is the time zone of the Amazon Kendra AEM connector or index. The default time zone for AEM as a Cloud Service is Greenwich Mean Time.

  • pageRootPaths

  • assetRootPaths

A list of root paths for pages and assets. For example, the root path for a page could be /content/sub and the root path for an asset could be /content/sub/asset1.
crawlAssets true to crawl assets.
crawlPages true to crawl pages.
  • pagePathInclusionPatterns

  • pageNameInclusionPatterns

  • assetPathInclusionPatterns

  • assetTypeInclusionPatterns

  • assetNameInclusionPatterns

A list of regular expression patterns to include certain pages and assets in your Adobe Experience Manager data source. Pages and assets that match the patterns are included in the index. Pages and assets that don't match the patterns are excluded from the index. If a page or asset matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index.
  • pagePathExclusionPatterns

  • pageNameExclusionPatterns

  • assetPathExclusionPatterns

  • assetTypeInclusionPatterns

  • assetNameInclusionPatterns

A list of regular expression patterns to exclude certain pages and assets in your Adobe Experience Manager data source. Pages and assets that match the patterns are excluded from the index. Pages and assets that don't match the patterns are included in the index. If a page or asset matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index.
pageComponents A list of names for the specific page components that you want to index.
contentFragmentVariations A list of names for the specific saved variations of Adobe Experience Manager Content Fragments that you want to index.
type The type of data source. Specify AEM as your data source type.
enableIdentityCrawler true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If you choose to turn the identity crawler off, you must upload the identity/principal information using the PutPrincipalMapping API.
syncMode

Specify whether Amazon Kendra should update your index by syncing all documents or only new, modified, and deleted documents. You can choose between the following options:

  • FORCED_FULL_CRAWL to freshly re-crawl all content and replace existing content each time your data source syncs with your index.

  • FULL_CRAWL to incrementally crawl only new, modified, and deleted content each time your data source syncs with your index.

  • CHANGE_LOG to incrementally crawl only new and modified content each time your data source syncs with your index.

secretArn

The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Adobe Experience Manager. The secret must contain a JSON structure with the following keys:

If using basic authentication for either AEM On-Premise or Cloud:

{ "aemUrl": "Adobe Experience Manager On-Premise host URL", "username": "user name with admin permissions", "password": "password with admin permissions" }

If using OAuth 2.0 authentication for AEM On-Premise:

{ "aemUrl": "Adobe Experience Manager host URL", "clientId": "client ID", "clientSecret": "client secret", "privateKey": "private key" }

If using OAuth 2.0 authentication for AEM as a Cloud Service:

{ "clientId": "client ID", "clientSecret": "client secret", "privateKey": "private key", "orgId": "organization ID", "technicalAccountId": "technical account ID", "imsHost": "Adobe Identity Management System (IMS) host" }
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "aemUrl": { "type": "string", "pattern": "https:.*" }, "authType": { "type": "string", "enum": ["Basic", "OAuth2"] }, "deploymentType": { "type": "string", "enum": ["CLOUD","ON_PREMISE"] } }, "required": [ "aemUrl", "authType", "deploymentType" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "page": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "asset": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "timeZoneId": { "type": "string", "enum": [ "Africa/Abidjan", "Africa/Accra", "Africa/Addis_Ababa", "Africa/Algiers", "Africa/Asmara", "Africa/Asmera", "Africa/Bamako", "Africa/Bangui", "Africa/Banjul", "Africa/Bissau", "Africa/Blantyre", "Africa/Brazzaville", "Africa/Bujumbura", "Africa/Cairo", "Africa/Casablanca", "Africa/Ceuta", "Africa/Conakry", "Africa/Dakar", "Africa/Dar_es_Salaam", "Africa/Djibouti", "Africa/Douala", "Africa/El_Aaiun", "Africa/Freetown", "Africa/Gaborone", "Africa/Harare", "Africa/Johannesburg", "Africa/Juba", "Africa/Kampala", "Africa/Khartoum", "Africa/Kigali", "Africa/Kinshasa", "Africa/Lagos", "Africa/Libreville", "Africa/Lome", "Africa/Luanda", "Africa/Lubumbashi", "Africa/Lusaka", "Africa/Malabo", "Africa/Maputo", "Africa/Maseru", "Africa/Mbabane", "Africa/Mogadishu", "Africa/Monrovia", "Africa/Nairobi", "Africa/Ndjamena", "Africa/Niamey", "Africa/Nouakchott", "Africa/Ouagadougou", "Africa/Porto-Novo", "Africa/Sao_Tome", "Africa/Timbuktu", "Africa/Tripoli", "Africa/Tunis", "Africa/Windhoek", "America/Adak", "America/Anchorage", "America/Anguilla", "America/Antigua", "America/Araguaina", "America/Argentina/Buenos_Aires", "America/Argentina/Catamarca", "America/Argentina/ComodRivadavia", "America/Argentina/Cordoba", "America/Argentina/Jujuy", "America/Argentina/La_Rioja", "America/Argentina/Mendoza", "America/Argentina/Rio_Gallegos", "America/Argentina/Salta", "America/Argentina/San_Juan", "America/Argentina/San_Luis", "America/Argentina/Tucuman", "America/Argentina/Ushuaia", "America/Aruba", "America/Asuncion", "America/Atikokan", "America/Atka", "America/Bahia", "America/Bahia_Banderas", "America/Barbados", "America/Belem", "America/Belize", "America/Blanc-Sablon", "America/Boa_Vista", "America/Bogota", "America/Boise", "America/Buenos_Aires", "America/Cambridge_Bay", "America/Campo_Grande", "America/Cancun", "America/Caracas", "America/Catamarca", "America/Cayenne", "America/Cayman", "America/Chicago", "America/Chihuahua", "America/Ciudad_Juarez", "America/Coral_Harbour", "America/Cordoba", "America/Costa_Rica", "America/Creston", "America/Cuiaba", "America/Curacao", "America/Danmarkshavn", "America/Dawson", "America/Dawson_Creek", "America/Denver", "America/Detroit", "America/Dominica", "America/Edmonton", "America/Eirunepe", "America/El_Salvador", "America/Ensenada", "America/Fort_Nelson", "America/Fort_Wayne", "America/Fortaleza", "America/Glace_Bay", "America/Godthab", "America/Goose_Bay", "America/Grand_Turk", "America/Grenada", "America/Guadeloupe", "America/Guatemala", "America/Guayaquil", "America/Guyana", "America/Halifax", "America/Havana", "America/Hermosillo", "America/Indiana/Indianapolis", "America/Indiana/Knox", "America/Indiana/Marengo", "America/Indiana/Petersburg", "America/Indiana/Tell_City", "America/Indiana/Vevay", "America/Indiana/Vincennes", "America/Indiana/Winamac", "America/Indianapolis", "America/Inuvik", "America/Iqaluit", "America/Jamaica", "America/Jujuy", "America/Juneau", "America/Kentucky/Louisville", "America/Kentucky/Monticello", "America/Knox_IN", "America/Kralendijk", "America/La_Paz", "America/Lima", "America/Los_Angeles", "America/Louisville", "America/Lower_Princes", "America/Maceio", "America/Managua", "America/Manaus", "America/Marigot", "America/Martinique", "America/Matamoros", "America/Mazatlan", "America/Mendoza", "America/Menominee", "America/Merida", "America/Metlakatla", "America/Mexico_City", "America/Miquelon", "America/Moncton", "America/Monterrey", "America/Montevideo", "America/Montreal", "America/Montserrat", "America/Nassau", "America/New_York", "America/Nipigon", "America/Nome", "America/Noronha", "America/North_Dakota/Beulah", "America/North_Dakota/Center", "America/North_Dakota/New_Salem", "America/Nuuk", "America/Ojinaga", "America/Panama", "America/Pangnirtung", "America/Paramaribo", "America/Phoenix", "America/Port-au-Prince", "America/Port_of_Spain", "America/Porto_Acre", "America/Porto_Velho", "America/Puerto_Rico", "America/Punta_Arenas", "America/Rainy_River", "America/Rankin_Inlet", "America/Recife", "America/Regina", "America/Resolute", "America/Rio_Branco", "America/Rosario", "America/Santa_Isabel", "America/Santarem", "America/Santiago", "America/Santo_Domingo", "America/Sao_Paulo", "America/Scoresbysund", "America/Shiprock", "America/Sitka", "America/St_Barthelemy", "America/St_Johns", "America/St_Kitts", "America/St_Lucia", "America/St_Thomas", "America/St_Vincent", "America/Swift_Current", "America/Tegucigalpa", "America/Thule", "America/Thunder_Bay", "America/Tijuana", "America/Toronto", "America/Tortola", "America/Vancouver", "America/Virgin", "America/Whitehorse", "America/Winnipeg", "America/Yakutat", "America/Yellowknife", "Antarctica/Casey", "Antarctica/Davis", "Antarctica/DumontDUrville", "Antarctica/Macquarie", "Antarctica/Mawson", "Antarctica/McMurdo", "Antarctica/Palmer", "Antarctica/Rothera", "Antarctica/South_Pole", "Antarctica/Syowa", "Antarctica/Troll", "Antarctica/Vostok", "Arctic/Longyearbyen", "Asia/Aden", "Asia/Almaty", "Asia/Amman", "Asia/Anadyr", "Asia/Aqtau", "Asia/Aqtobe", "Asia/Ashgabat", "Asia/Ashkhabad", "Asia/Atyrau", "Asia/Baghdad", "Asia/Bahrain", "Asia/Baku", "Asia/Bangkok", "Asia/Barnaul", "Asia/Beirut", "Asia/Bishkek", "Asia/Brunei", "Asia/Calcutta", "Asia/Chita", "Asia/Choibalsan", "Asia/Chongqing", "Asia/Chungking", "Asia/Colombo", "Asia/Dacca", "Asia/Damascus", "Asia/Dhaka", "Asia/Dili", "Asia/Dubai", "Asia/Dushanbe", "Asia/Famagusta", "Asia/Gaza", "Asia/Harbin", "Asia/Hebron", "Asia/Ho_Chi_Minh", "Asia/Hong_Kong", "Asia/Hovd", "Asia/Irkutsk", "Asia/Istanbul", "Asia/Jakarta", "Asia/Jayapura", "Asia/Jerusalem", "Asia/Kabul", "Asia/Kamchatka", "Asia/Karachi", "Asia/Kashgar", "Asia/Kathmandu", "Asia/Katmandu", "Asia/Khandyga", "Asia/Kolkata", "Asia/Krasnoyarsk", "Asia/Kuala_Lumpur", "Asia/Kuching", "Asia/Kuwait", "Asia/Macao", "Asia/Macau", "Asia/Magadan", "Asia/Makassar", "Asia/Manila", "Asia/Muscat", "Asia/Nicosia", "Asia/Novokuznetsk", "Asia/Novosibirsk", "Asia/Omsk", "Asia/Oral", "Asia/Phnom_Penh", "Asia/Pontianak", "Asia/Pyongyang", "Asia/Qatar", "Asia/Qostanay", "Asia/Qyzylorda", "Asia/Rangoon", "Asia/Riyadh", "Asia/Saigon", "Asia/Sakhalin", "Asia/Samarkand", "Asia/Seoul", "Asia/Shanghai", "Asia/Singapore", "Asia/Srednekolymsk", "Asia/Taipei", "Asia/Tashkent", "Asia/Tbilisi", "Asia/Tehran", "Asia/Tel_Aviv", "Asia/Thimbu", "Asia/Thimphu", "Asia/Tokyo", "Asia/Tomsk", "Asia/Ujung_Pandang", "Asia/Ulaanbaatar", "Asia/Ulan_Bator", "Asia/Urumqi", "Asia/Ust-Nera", "Asia/Vientiane", "Asia/Vladivostok", "Asia/Yakutsk", "Asia/Yangon", "Asia/Yekaterinburg", "Asia/Yerevan", "Atlantic/Azores", "Atlantic/Bermuda", "Atlantic/Canary", "Atlantic/Cape_Verde", "Atlantic/Faeroe", "Atlantic/Faroe", "Atlantic/Jan_Mayen", "Atlantic/Madeira", "Atlantic/Reykjavik", "Atlantic/South_Georgia", "Atlantic/St_Helena", "Atlantic/Stanley", "Australia/ACT", "Australia/Adelaide", "Australia/Brisbane", "Australia/Broken_Hill", "Australia/Canberra", "Australia/Currie", "Australia/Darwin", "Australia/Eucla", "Australia/Hobart", "Australia/LHI", "Australia/Lindeman", "Australia/Lord_Howe", "Australia/Melbourne", "Australia/NSW", "Australia/North", "Australia/Perth", "Australia/Queensland", "Australia/South", "Australia/Sydney", "Australia/Tasmania", "Australia/Victoria", "Australia/West", "Australia/Yancowinna", "Brazil/Acre", "Brazil/DeNoronha", "Brazil/East", "Brazil/West", "CET", "CST6CDT", "Canada/Atlantic", "Canada/Central", "Canada/Eastern", "Canada/Mountain", "Canada/Newfoundland", "Canada/Pacific", "Canada/Saskatchewan", "Canada/Yukon", "Chile/Continental", "Chile/EasterIsland", "Cuba", "EET", "EST5EDT", "Egypt", "Eire", "Etc/GMT", "Etc/GMT+0", "Etc/GMT+1", "Etc/GMT+10", "Etc/GMT+11", "Etc/GMT+12", "Etc/GMT+2", "Etc/GMT+3", "Etc/GMT+4", "Etc/GMT+5", "Etc/GMT+6", "Etc/GMT+7", "Etc/GMT+8", "Etc/GMT+9", "Etc/GMT-0", "Etc/GMT-1", "Etc/GMT-10", "Etc/GMT-11", "Etc/GMT-12", "Etc/GMT-13", "Etc/GMT-14", "Etc/GMT-2", "Etc/GMT-3", "Etc/GMT-4", "Etc/GMT-5", "Etc/GMT-6", "Etc/GMT-7", "Etc/GMT-8", "Etc/GMT-9", "Etc/GMT0", "Etc/Greenwich", "Etc/UCT", "Etc/UTC", "Etc/Universal", "Etc/Zulu", "Europe/Amsterdam", "Europe/Andorra", "Europe/Astrakhan", "Europe/Athens", "Europe/Belfast", "Europe/Belgrade", "Europe/Berlin", "Europe/Bratislava", "Europe/Brussels", "Europe/Bucharest", "Europe/Budapest", "Europe/Busingen", "Europe/Chisinau", "Europe/Copenhagen", "Europe/Dublin", "Europe/Gibraltar", "Europe/Guernsey", "Europe/Helsinki", "Europe/Isle_of_Man", "Europe/Istanbul", "Europe/Jersey", "Europe/Kaliningrad", "Europe/Kiev", "Europe/Kirov", "Europe/Kyiv", "Europe/Lisbon", "Europe/Ljubljana", "Europe/London", "Europe/Luxembourg", "Europe/Madrid", "Europe/Malta", "Europe/Mariehamn", "Europe/Minsk", "Europe/Monaco", "Europe/Moscow", "Europe/Nicosia", "Europe/Oslo", "Europe/Paris", "Europe/Podgorica", "Europe/Prague", "Europe/Riga", "Europe/Rome", "Europe/Samara", "Europe/San_Marino", "Europe/Sarajevo", "Europe/Saratov", "Europe/Simferopol", "Europe/Skopje", "Europe/Sofia", "Europe/Stockholm", "Europe/Tallinn", "Europe/Tirane", "Europe/Tiraspol", "Europe/Ulyanovsk", "Europe/Uzhgorod", "Europe/Vaduz", "Europe/Vatican", "Europe/Vienna", "Europe/Vilnius", "Europe/Volgograd", "Europe/Warsaw", "Europe/Zagreb", "Europe/Zaporozhye", "Europe/Zurich", "GB", "GB-Eire", "GMT", "GMT0", "Greenwich", "Hongkong", "Iceland", "Indian/Antananarivo", "Indian/Chagos", "Indian/Christmas", "Indian/Cocos", "Indian/Comoro", "Indian/Kerguelen", "Indian/Mahe", "Indian/Maldives", "Indian/Mauritius", "Indian/Mayotte", "Indian/Reunion", "Iran", "Israel", "Jamaica", "Japan", "Kwajalein", "Libya", "MET", "MST7MDT", "Mexico/BajaNorte", "Mexico/BajaSur", "Mexico/General", "NZ", "NZ-CHAT", "Navajo", "PRC", "PST8PDT", "Pacific/Apia", "Pacific/Auckland", "Pacific/Bougainville", "Pacific/Chatham", "Pacific/Chuuk", "Pacific/Easter", "Pacific/Efate", "Pacific/Enderbury", "Pacific/Fakaofo", "Pacific/Fiji", "Pacific/Funafuti", "Pacific/Galapagos", "Pacific/Gambier", "Pacific/Guadalcanal", "Pacific/Guam", "Pacific/Honolulu", "Pacific/Johnston", "Pacific/Kanton", "Pacific/Kiritimati", "Pacific/Kosrae", "Pacific/Kwajalein", "Pacific/Majuro", "Pacific/Marquesas", "Pacific/Midway", "Pacific/Nauru", "Pacific/Niue", "Pacific/Norfolk", "Pacific/Noumea", "Pacific/Pago_Pago", "Pacific/Palau", "Pacific/Pitcairn", "Pacific/Pohnpei", "Pacific/Ponape", "Pacific/Port_Moresby", "Pacific/Rarotonga", "Pacific/Saipan", "Pacific/Samoa", "Pacific/Tahiti", "Pacific/Tarawa", "Pacific/Tongatapu", "Pacific/Truk", "Pacific/Wake", "Pacific/Wallis", "Pacific/Yap", "Poland", "Portugal", "ROK", "Singapore", "SystemV/AST4", "SystemV/AST4ADT", "SystemV/CST6", "SystemV/CST6CDT", "SystemV/EST5", "SystemV/EST5EDT", "SystemV/HST10", "SystemV/MST7", "SystemV/MST7MDT", "SystemV/PST8", "SystemV/PST8PDT", "SystemV/YST9", "SystemV/YST9YDT", "Turkey", "UCT", "US/Alaska", "US/Aleutian", "US/Arizona", "US/Central", "US/East-Indiana", "US/Eastern", "US/Hawaii", "US/Indiana-Starke", "US/Michigan", "US/Mountain", "US/Pacific", "US/Samoa", "UTC", "Universal", "W-SU", "WET", "Zulu", "EST", "HST", "MST", "ACT", "AET", "AGT", "ART", "AST", "BET", "BST", "CAT", "CNT", "CST", "CTT", "EAT", "ECT", "IET", "IST", "JST", "MIT", "NET", "NST", "PLT", "PNT", "PRT", "PST", "SST", "VST" ] }, "pageRootPaths": { "type": "array", "items": { "type": "string" } }, "assetRootPaths": { "type": "array", "items": { "type": "string" } }, "crawlAssets": { "type": "boolean" }, "crawlPages": { "type": "boolean" }, "pagePathInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "pagePathExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "pageNameInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "pageNameExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "assetPathInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "assetPathExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "assetTypeInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "assetTypeExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "assetNameInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "assetNameExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "pageComponents": { "type": "array", "items": { "type": "object" } }, "contentFragmentVariations": { "type": "array", "items": { "type": "object" } }, "cugExemptedPrincipals": { "type": "array", "items": { "type": "string" } } }, "required": [] }, "type": { "type": "string", "pattern": "AEM" }, "enableIdentityCrawler": { "type": "boolean" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Alfresco template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the Alfresco site ID, repository URL, user interface URL, authentication type, whether you use cloud or on-premises, and the type of content you want to crawl. You provide this as a part of the connection configuration or repository endpoint details. Also specify the type of data source as ALFRESCO, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Alfresco JSON schema.

The following provides information on important JSON keys to configure.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
siteId The identifier of the Alfresco site.
repoUrl The URL of your Alfresco repository. You can get the repository URL from your Alfresco administrator. For example, if you use Alfresco Cloud (PaaS), the repository URL could be https://company.alfrescocloud.com. Or, if you use Alfresco On-Premises, the repository URL could be https://company-alfresco-instance.company-domain.suffix:port.
webAppUrl The URL of your Alfresco user interface. You can get the Alfresco user interface URL from your Alfresco administrator. For example, the user interface URL could be https://example.com.
repositoryAdditionalProperties Additional properties to connect with the repository/data source endpoint.
authType The type of authentication that you use, whether OAuth2 or Basic.
type (deployment) The type of Alfresco that you use, whether PAAS or ON-PREM.
crawlType The type of content that you want to crawl, whether ASPECT (content marked with 'Aspects' in Alfresco), SITE_ID (content within a specific Alfresco site), or ALL_SITES (content across all your Alfresco sites).
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • document

  • comment

A list of objects that map the attributes or field names of your Alfresco documents and comments to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source.
aspectName

The name of a specific 'Aspect' that you want to index.

aspectProperties

A list of specific 'Aspect' content properties that you want to index.

enableFineGrainedControl

true to crawl 'Aspects'.

isCrawlComment

true to index comments.

  • inclusionFileNamePatterns

  • inclusionFileTypePatterns

  • inclusionFilePathPatterns

A list of regular expression patterns to include certain files in your Alfresco data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
  • exclusionFileNamePatterns

  • exclusionFileTypePatterns

  • exclusionFilePathPatterns

A list of regular expression patterns to exclude certain files in your Alfresco data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
type The type of data source. Specify ALFRESCO as your data source type.
secretArn

The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs that are required to connect to your Alfresco. The secret must contain a JSON structure with the following keys:

If using basic authentication:

{ "username": "user name", "password": "password" }

If using OAuth 2.0 authentication:

{ "clientId": "client ID", "clientSecret": "client secret", "tokenUrl": "token URL" }
syncMode

Specify whether Amazon Kendra should update your index by syncing all documents or only new, modified, and deleted documents. You can choose between:

  • FORCED_FULL_CRAWL to freshly re-crawl all content and replace existing content each time your data source syncs with your index.

  • FULL_CRAWL to incrementally crawl only new, modified, and deleted content each time your data source syncs with your index.

enableIdentityCrawler true to use the Amazon Kendra identity crawler to sync identity/principal information on users and groups with access to certain documents. If you choose to turn identity crawler off, you must upload the identity/principal information using the PutPrincipalMapping API.
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "siteId": { "type": "string" }, "repoUrl": { "type": "string" }, "webAppUrl": { "type": "string" }, "repositoryAdditionalProperties": { "type": "object", "properties": { "authType": { "type": "string", "enum": [ "OAuth2", "Basic" ] }, "type": { "type": "string", "enum": [ "PAAS", "ON_PREM" ] }, "crawlType": { "type": "string", "enum": [ "ASPECT", "SITE_ID", "ALL_SITES" ] } } } } } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "document": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "STRING_LIST", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "comment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "STRING_LIST", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "aspectName": { "type": "string" }, "aspectProperties": { "type": "array" }, "enableFineGrainedControl": { "type": "boolean" }, "isCrawlComment": { "type": "boolean" }, "inclusionFileNamePatterns": { "type": "array" }, "exclusionFileNamePatterns": { "type": "array" }, "inclusionFileTypePatterns": { "type": "array" }, "exclusionFileTypePatterns": { "type": "array" }, "inclusionFilePathPatterns": { "type": "array" }, "exclusionFilePathPatterns": { "type": "array" } } }, "type": { "type": "string", "pattern": "ALFRESCO" }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL" ] }, "enableIdentityCrawler": { "type": "boolean" }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] } }, "required": [ "connectionConfiguration", "repositoryConfigurations", "additionalProperties", "type", "secretArn" ] }

Amazon S3 template schema

You include a JSON that contains the data source schema as part of the template configuration. You provide the name of the S3 bucket as a part of the connection configuration or repository endpoint details. Also specify the type of data source as S3, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See S3 JSON schema.

The following provides information on important JSON keys to configure.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
BucketName The name of your Amazon S3 bucket.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
additionalProperties Additional configuration options for your content in your data source
  • inclusionPatterns

  • exclusionPatterns

  • inclusionPrefixes

  • exclusionPrefixes

A list of regular expression patterns to include or exclude specific files in your Amazon S3 data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
aclConfigurationFilePath The file path that controls access to documents in an Amazon Kendra index.
metadataFilesPrefix The location within your bucket for metadata files.
syncMode Specify whether Amazon Kendra should update your index by syncing all documents or only new, modified, and deleted documents. You can choose
  • FORCED_FULL_CRAWL to freshly re-crawl all content and replace existing content each time your data source syncs with your index

  • FULL_CRAWL to incrementally crawl only new, modified, and deleted content each time your data source syncs with your index

type The type of data source. Specify S3 as your data source type.
version The version of the template that is supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "BucketName": { "type": "string" } }, "required": [ "BucketName" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "document": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING" ] }, "dataSourceFieldName": { "type": "string" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } }, "required": [ "document" ] }, "additionalProperties": { "type": "object", "properties": { "inclusionPatterns": { "type": "array" }, "exclusionPatterns": { "type": "array" }, "inclusionPrefixes": { "type": "array" }, "exclusionPrefixes": { "type": "array" }, "aclConfigurationFilePath": { "type": "string" }, "metadataFilesPrefix": { "type": "string" } } }, "syncMode": { "type": "string", "enum": [ "FULL_CRAWL", "FORCED_FULL_CRAWL" ] }, "type": { "type": "string", "pattern": "S3" }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] } }, "required": [ "connectionConfiguration", "type", "syncMode", "repositoryConfigurations" ] }

Amazon Kendra Web Crawler template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the seed or starting point URLs, or you can provide the sitemap URLs, as part of the connection configuration or repository endpoint details. Instead of manually listing all your URLs, you can provide the path to the Amazon S3 bucket that stores a text file for your list of seed URLs or sitemap XML files, which you can club together in a ZIP file in S3.

Also specify the type of data source as WEBCRAWLERV2, the website authentication credentials and authentication type if your websites require authentication, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

When selecting websites to index, you must adhere to the Amazon Acceptable Use Policy and all other Amazon terms. Remember that you must only use Amazon Kendra Web Crawler to index your own web pages, or web pages that you have authorization to index. To learn how to stop Amazon Kendra Web Crawler from indexing your websites, see Configuring the robots.txt file for Amazon Kendra Web Crawler.

You can use the template provided in this developer guide. See Amazon Kendra Web Crawler JSON schema.

The following provides information on important JSON keys to configure.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
siteMapUrls The list of sitemap URLs for the websites that you want to crawl. You can list up to three sitemap URLs.
s3SeedUrl The S3 path to the text file that stores the list of seed or starting point URLs. For example, s3://bucket-name/directory/. Each URL in the text file must be formatted on a separate line. You can list up to 100 seed URLs in a file.
s3SiteMapUrl The S3 path to the sitemap XML files. For example, s3://bucket-name/directory/. You can list up to three sitemap XML files. You can club together multiple sitemap files into a ZIP file and store the ZIP file in your Amazon S3 bucket.
seedUrlConnections The list of seed or starting point URLs for the websites that you want to crawl.You can list up to 100 seed URLs.
seedUrl The seed or starting point URL.
authentication The authentication type if your websites require the same authentication, otherwise specify NoAuthentication.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • webPage

  • attachment

A list of objects that map the attributes or field names of your web pages and web page files to Amazon Kendra index field names. For example, the HTML web page title tag can be mapped to the _document_title index field. For more information, see Mapping data source fields.
syncMode

Specify whether Amazon Kendra should update your index by syncing all documents or only new, modified, and deleted documents. You can choose between:

  • FORCED_FULL_CRAWL to freshly re-crawl all content and replace existing content each time your data source syncs with your index.

  • FULL_CRAWLto incrementally crawl only new, modified, and deleted content each time your data source syncs with your index.

additionalProperties Additional configuration options for your content in your data source.
rateLimit The maximum number of URLs crawled per website host per minute.
maxFileSize The maximum size (in MB) of a web page or attachment to crawl.
crawlDepth The number of levels from the seed URL to crawl. For example, the seed URL page is depth 1 and any hyperlinks on this page that are also crawled are depth 2.
maxLinksPerUrl The maximum number of URLs on a web page to include when crawling a website. This number is per web page. As a website's web pages are crawled, any URLs that the webpages link to also are crawled. URLs on a web page are crawled in order of appearance.
crawlSubDomain true to crawl the website domains with subdomains. For example, if the seed URL is "abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled. If you don't set crawlSubDomain or crawlAllDomain to true, then Amazon Kendra only crawls the domains of the websites that you want to crawl.
crawlAllDomain true to crawl the website domains with subdomains and other domains the web pages link to. If you don't set crawlSubDomain or crawlAllDomain to true, then Amazon Kendra only crawls the domains of the websites that you want to crawl.
honorRobots true to respect the robots.txt directives of the websites that you want to crawl. These directives control how Amazon Kendra Web Crawler crawls the websites, whether Amazon Kendra can crawl only specific content or not crawl any content.
crawlAttachments true to crawl files that the web pages link to.
  • inclusionURLCrawlPatterns

  • inclusionURLIndexPatterns

A list of regular expression patterns to include crawling certain URLs and indexing any hyperlinks on these URL web pages. URLs that match the patterns are included in the index. URLs that don't match the patterns are excluded from the index. If a URL matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the URL/website's web pages aren't included in the index.
  • exclusionURLCrawlPatterns

  • exclusionURLIndexPatterns

A list of regular expression patterns to exclude crawling certain URLs and indexing any hyperlinks on these URL web pages. URLs that match the patterns are excluded from the index. URLs that don't match the patterns are included in the index. If a URL matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the URL/website's web pages aren't included in the index.
inclusionFileIndexPatterns A list of regular expression patterns to include certain web page files. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
exclusionFileIndexPatterns A list of regular expression patterns to exclude certain web page files. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
proxy Configuration information required to connect to your internal websites via a web proxy.
host The host name of the proxy sever you want to use to connect to internal websites. For example, the host name of https://a.example.com/page1.html is "a.example.com".
port The port number of the proxy sever you want to use to connect to internal websites. For example, 443 is the standard port for HTTPS.
secretArn (proxy) If web proxy credentials are required to connect to a website host, you can create an AWS Secrets Manager secret that stores the credentials. Provide the Amazon Resource Name (ARN) of the secret.
type The type of data source. Specify WEBCRAWLERV2 as your data source type.
secretArn

The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that's used if your websites require authentication to access the websites. You store the authentication credentials for the website in the secret that contains JSON key-value pairs.

If you use basic, or NTML/Kerberos, enter the user name and password. The JSON keys in the secret must be userName and password. NTLM authentication protocol includes password hashing, and Kerberos authentication protocol includes password encryption.

If you use SAML or form authentication, enter the user name and password, XPath for the user name field (and user name button if using SAML), XPaths for the password field and button, and the login page URL. The JSON keys in the secret must be userName, password, userNameFieldXpath, userNameButtonXpath, passwordFieldXpath, passwordButtonXpath, and loginPageUrl. You can find the XPaths (XML Path Language) of elements using your web browser's developer tools. XPaths usually follow this format: //tagname[@Attribute='Value'].

Amazon Kendra also checks if the endpoint information (seed URLs) included in the secret is the same the endpoint information specified in your data source endpoint configuration details.

version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "siteMapUrls": { "type": "array", "items":{ "type": "string", "pattern": "https://.*" } }, "s3SeedUrl": { "type": "string", "pattern": "s3:.*" }, "s3SiteMapUrl": { "type": "string", "pattern": "s3:.*" }, "seedUrlConnections": { "type": "array", "items": [ { "type": "object", "properties": { "seedUrl":{ "type": "string", "pattern": "https://.*" } }, "required": [ "seedUrl" ] } ] }, "authentication": { "type": "string", "enum": [ "NoAuthentication", "BasicAuth", "NTLM_Kerberos", "Form", "SAML" ] } } } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "webPage": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "attachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL" ] }, "additionalProperties": { "type": "object", "properties": { "rateLimit": { "type": "string", "default": "300" }, "maxFileSize": { "type": "string", "default": "50" }, "crawlDepth": { "type": "string", "default": "2" }, "maxLinksPerUrl": { "type": "string", "default": "100" }, "crawlSubDomain": { "type": "boolean", "default": false }, "crawlAllDomain": { "type": "boolean", "default": false }, "honorRobots": { "type": "boolean", "default": false }, "crawlAttachments": { "type": "boolean", "default": false }, "inclusionURLCrawlPatterns": { "type": "array", "items": { "type": "string" } }, "exclusionURLCrawlPatterns": { "type": "array", "items": { "type": "string" } }, "inclusionURLIndexPatterns": { "type": "array", "items": { "type": "string" } }, "exclusionURLIndexPatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFileIndexPatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileIndexPatterns": { "type": "array", "items": { "type": "string" } }, "proxy": { "type": "object", "properties": { "host": { "type": "string" }, "port": { "type": "string" }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } } } }, "required": [ "rateLimit", "maxFileSize", "crawlDepth", "crawlSubDomain", "crawlAllDomain", "maxLinksPerUrl", "honorRobots" ] }, "type": { "type": "string", "pattern": "WEBCRAWLERV2" }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "type", "additionalProperties" ] }

Confluence template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the Confluence host URL, the hosting method, and the authentication type as a part of the connection configuration or repository endpoint details. Also specify the type of data source as CONFLUENCEV2, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Confluence JSON schema.

The following provides information on important JSON keys to configure.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
hostUrl The URL for your Confluence instance. For example, https://example.confluence.com.
type The hosting method for your Confluence instance, whether SAAS and ON_PREM.
authType The authentication method for your Confluence instance, whether Basic, OAuth2, or Personal-token.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • space

  • page

  • blog

  • comment

  • attachment

A list of objects that map the attributes or field names of your Confluence spaces, pages, blogs, comments, and attachments to Amazon Kendra index field names. For more information, see Mapping data source fields. The Confluence data source field names must exist in your Confluence custom metadata.
additionalProperties Additional configuration options for your content in your data source.
  • inclusionSpaceKeyFilter

  • exclusionSpaceKeyFilter

  • pageTitleRegEX

  • blogTitleRegEX

  • commentTitleRegEX

  • attachmentTitleRegEX

  • inclusionFileTypePatterns

  • exclusionFileTypePatterns

  • inclusionUrlPatterns

  • exclusionUrlPatterns

  • fieldForUserId

A list of regular expression patterns to include and/or exclude certain files in your Confluence data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
proxyHost The host name of the web proxy you are using, without the http:// or https:// protocol.

proxyPort

The post number used by the host URL transport protocol. Must be a numeric value between 0 and 65535.
  • isCrawlPersonalSpace

  • isCrawlArchivedSpace

  • isCrawlArchivedPage

  • isCrawlPage

  • isCrawlBlog

  • isCrawlPageComment

  • isCrawlPageAttachment

  • isCrawlBlogComment

  • isCrawlBlogAttachment

true to index files in your Confluence personal spaces, pages, blogs, page comments, page attachments, blog comments, and blog attachments.
type The type of data source. Specify CONFLUENCEV2 as your data source type.
enableIdentityCrawler true to activate identity crawler. Identity crawler is activated by default. If identity crawler is deactivated, you must upload the principal information using the PutPrincipalMapping API. Crawling identity information on users and groups with access to certain documents is useful for user context filtering. Search results are filtered based on the user or their group access to documents. For more information, see Filtering on user context.
syncMode Specify whether Amazon Kendra should update your index by syncing all documents or only new, modified, and deleted documents. You can choose between:
  • FORCED_FULL_CRAWL to freshly re-crawl all content and replace existing content each time your data source syncs with your index

  • FULL_CRAWL to incrementally crawl only new, modified, and deleted content each time your data source syncs with your index

secretARN

The Amazon Resource Name (ARN) of a Secrets Manager secret that contains the key-value pairs required to connect to your Confluence instance.

If you use basic authentication, the secret must contain a JSON structure with the following keys:
{ "username": "Confluence account user name", "password": "Confluence API token" }
If you use OAuth 2.0 authentication, the secret must contain a JSON structure with the following keys:
{ "confluenceAppKey": "app key for your Confluence account", "confluenceAppSecret": "app secret from your Confluence token", "confluenceAccessToken": "access token created in Confluence", "confluenceRefreshToken": "refresh token created in Confluence" }
For Confluence Server only) If you use basic authentication, the secret is stored in a JSON structure with the following keys:
{ "hostUrl": "Confluence Server host URL", "username": "Confluence Server user name", "password": "Confluence Server password" }
(For Confluence Server only) If you use Personal Access Token authentication, the secret is stored in a JSON structure with the following keys:
{ "hostUrl": "Confluence Server host URL", "patToken": "Confluence token" }
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "hostUrl": { "type": "string", "pattern": "https:.*" }, "type": { "type": "string", "enum": [ "SAAS", "ON_PREM" ] }, "authType": { "type": "string", "enum": [ "Basic", "OAuth2", "Personal-token" ] } }, "required": [ "hostUrl", "type", "authType" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "space": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "page": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "blog": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "comment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "attachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "fieldForUserId": { "type": "string" }, "inclusionSpaceKeyFilter": { "type": "array", "items": { "type": "string" } }, "exclusionSpaceKeyFilter": { "type": "array", "items": { "type": "string" } }, "pageTitleRegEX": { "type": "array", "items": { "type": "string" } }, "blogTitleRegEX": { "type": "array", "items": { "type": "string" } }, "commentTitleRegEX": { "type": "array", "items": { "type": "string" } }, "attachmentTitleRegEX": { "type": "array", "items": { "type": "string" } }, "isCrawlPersonalSpace": { "type": "boolean" }, "isCrawlArchivedSpace": { "type": "boolean" }, "isCrawlArchivedPage": { "type": "boolean" }, "isCrawlPage": { "type": "boolean" }, "isCrawlBlog": { "type": "boolean" }, "isCrawlPageComment": { "type": "boolean" }, "isCrawlPageAttachment": { "type": "boolean" }, "isCrawlBlogComment": { "type": "boolean" }, "isCrawlBlogAttachment": { "type": "boolean" }, "inclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionUrlPatterns": { "type": "array", "items": { "type": "string" } }, "exclusionUrlPatterns": { "type": "array", "items": { "type": "string" } }, "proxyHost": { "type": "string" }, "proxyPort": { "type": "string" } }, "required": [] }, "type": { "type": "string", "pattern": "CONFLUENCEV2" }, "enableIdentityCrawler": { "type": "boolean" }, "syncMode": { "type": "string", "enum": [ "FULL_CRAWL", "FORCED_FULL_CRAWL" ] }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Dropbox template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the Dropbox app key, app secret, and access token as part of your secret that stores your authentication credentials. Also specify the type of data source as DROPBOX, the type of access token you want to use (temporary or permanent), and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Dropbox JSON schema.

The following provides information on important JSON keys to configure.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source. This data source does not specify an endpoint in repositoryEndpointMetadata. Rather, the connection information is included in an AWS Secrets Manager secret that you provide the secretArn.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • file

  • paper

  • papert

  • shortcut

A list of objects that map the attributes or field names of your Dropbox files, Dropbox Paper, and shortcuts to Amazon Kendra index field names. For more information, see Mapping data source fields.
secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Dropbox. The secret must contain a JSON structure with the following keys:
{ "appKey": "Dropbox app key", "appSecret": "Dropbox app secret", "accesstoken": "temporary access token or refresh access token" }
additionalProperties Additional configuration options for your content in your data source.
  • inclusionFileNamePatterns

  • inclusionFileTypePatterns

A list of regular expression patterns to include certain file names and types in your Dropbox data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
  • exclusionFileNamePatterns

  • exclusionFileTypePatterns

A list of regular expression patterns to exclude certain file names and types in your Dropbox data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
  • crawlFile

  • crawlPaper

  • crawlPapert

  • crawlShortcut

true to index files in your Dropbox, Dropbox Paper documents, Dropbox Paper templates, and webpage shortcuts stored in your Dropbox.
type The type of data source. Specify DROPBOX as your data source type.
useChangeLog true to use the Dropbox change log to determine which documents require adding, updating, or deleting in the index. Depending on the change log's size, it may take longer for Amazon Kendra to use the change log than to scan all of your documents in your Dropbox.
tokenType Specify your access token type: permanent or temporary access token. It's recommended that you create a refresh access token that never expires in Dropbox rather that relying on a one-time access token that expires after 4 hours. You create an app and a refresh access token in the Dropbox developer console and provide the access token in your secret.
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { } } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "file": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "LONG", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "paper": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "LONG", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "papert": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "LONG", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "shortcut": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "LONG", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] } } }, "secretArn": { "type": "string" }, "additionalProperties": { "type": "object", "properties": { "inclusionFileNamePatterns": { "type": "array" }, "exclusionFileNamePatterns": { "type": "array" }, "inclusionFileTypePatterns": { "type": "array" }, "exclusionFileTypePatterns": { "type": "array" }, "crawlFile": { "type": "boolean" }, "crawlPaper": { "type": "boolean" }, "crawlPapert": { "type": "boolean" }, "crawlShortcut": { "type": "boolean" } } }, "type": { "type": "string", "pattern": "DROPBOX" }, "useChangeLog": { "type": "string", "enum": [ "true", "false" ] }, "tokenType": { "type": "string", "enum": [ "PERMANENT", "TEMPORARY" ] }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] } }, "additionalProperties": false, "required": [ "connectionConfiguration", "repositoryConfigurations", "additionalProperties", "useChangeLog", "secretArn", "type", "tokenType" ] }

Drupal template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the Drupal host URL and the authentication type as part of the connection configuration or repository endpoint details. Also specify the type of data source as DRUPAL, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Drupal JSON schema.

The following provides information on important JSON keys to configure.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
hostUrl The host url of your Drupal website. For example, https://<hostname>/<drupalsitename>.
repositoryConfigurations Configuration information for the content of the data source.
  • content

  • comment

  • attachment

A list of objects that map the attributes or field names of your Drupal files. For more information, see Mapping data source fields. The Drupal data source field names must exist in your Drupal custom metadata.
additionalProperties Additional configuration options for your content in your data source.
  • inclusionFileNamePatterns

  • articleTitleInclusionPatterns

  • pageTitleInclusionPatterns

  • customContentTitleInclusionPatterns

  • basicBlockTitleInclusionPatterns

  • customBlockTitleInclusionPatterns

A list of regular expression patterns to include certain files in your Drupal data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
  • exclusionFileNamePatterns

  • articleTitleExclusionPatterns

  • pageTitleExclusionPatterns

  • customContentTitleExclusionPatterns

  • basicBlockTitleExclusionPatterns

  • customBlockTitleExclusionPatterns

A list of regular expression patterns to exclude certain files in your Drupal data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
contentDefinitions
  • contentType

  • fieldDefinition

  • isCrawlComments

  • isCrawlFiles

  • isCrawlArticle

  • isCrawlBasicPage

  • isCrawlBasicBlock

  • isCrawlCustomContentTypesList

Specify the content types to crawl and whether to crawl comments and attachments for your selected content types.
type The type of data source. Specify DRUPAL as your data source type.
authType The type of authentication you are using, whether BASIC-AUTH or OAUTH2.
syncMode Specify whether Amazon Kendra should update your index by syncing all documents or only new, modified, and deleted documents. You can choose
  • FORCED_FULL_CRAWL to freshly re-crawl all content and replace existing content each time your data source syncs with your index

  • FULL_CRAWL to incrementally crawl only new, modified, and deleted content each time your data source syncs with your index

  • CHANGE_LOG to incrementally crawl only new and modified content each time your data source syncs with your index

enableIdentityCrawler true to activate identity crawler. Identity crawler is activated by default. If identity crawler is deactivated, you must upload the principal information using the PutPrincipalMapping API. Crawling identity information on users and groups with access to certain documents is useful for user context filtering. Search results are filtered based on the user or their group access to documents. For more information, see Filtering on user context.
secretARN The Amazon Resource Name (ARN) of a Secrets Manager secret that contains the key-value pairs required to connect to your Drupal. The secret must contain a JSON structure with the following keys:

If using basic authentication:

{ "user name": "user name", "passwords": "password" }

If using OAuth 2.0 authentication:

{ "Client ID": "client_id", "Client secret": "client_secret", "user name": "user name", "password": "password" }
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "hostUrl": { "type": "string", "pattern": "https:.*" } }, "required": [ "hostUrl" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "content": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "comment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "attachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "isCrawlArticle": { "type": "boolean" }, "isCrawlBasicPage": { "type": "boolean" }, "isCrawlBasicBlock": { "type": "boolean" }, "crawlCustomContentTypesList": { "type": "array", "items": { "type": "string" } }, "crawlCustomBlockTypesList": { "type": "array", "items": { "type": "string" } }, "filePath": { "anyOf": [ { "type": "string", "pattern": "s3:.*" }, { "type": "string", "pattern": "" } ] }, "inclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "articleTitleInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "articleTitleExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "pageTitleInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "pageTitleExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "customContentTitleInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "customContentTitleExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "basicBlockTitleInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "basicBlockTitleExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "customBlockTitleInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "customBlockTitleExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "contentDefinitions": { "type": "array", "items": { "properties": { "contentType": { "type": "string" }, "fieldDefinition": { "type": "array", "items": [ { "type": "object", "properties": { "machineName": { "type": "string" }, "type": { "type": "string" } }, "required": [ "machineName", "type" ] } ] }, "isCrawlComments": { "type": "boolean" }, "isCrawlFiles": { "type": "boolean" } } }, "required": [ "contentType", "fieldDefinition", "isCrawlComments", "isCrawlFiles" ] } }, "required": [] }, "type": { "type": "string", "pattern": "DRUPAL" }, "authType": { "type": "string", "enum": [ "BASIC-AUTH", "OAUTH2" ] }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "enableIdentityCrawler": { "type": "boolean" }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Gmail template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. Specify the type of data source as GMAIL, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Gmail JSON schema.

The following provides information on important JSON keys to configure.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN.
  • message

  • attachments

A list of objects that map the attributes or field names of your Gmail messages and attachments to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source.
  • inclusionLabelNamePatterns

  • exclusionLabelNamePatterns

  • inclusionAttachmentTypePatterns

  • exclusionAttachmentTypePatterns

  • inclusionAttachmentNamePatterns

  • exclusionAttachmentNamePatterns

  • inclusionSubjectFilter

  • exclusionSubjectFilter

  • isSubjectAnd

  • inclusionFromFilter

  • exclusionFromFilter

  • inclusionToFilter

  • exclusionToFilter

  • inclusionCcFilter

  • exclusionCcFilter

  • inclusionBccFilter

  • exclusionBccFilter

A list of regular expression patterns to include or exclude messages with specific subject names in your Gmail data source. Files that match the patterns are included in the index. If a file matches both an inclusion and an exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
beforeDateFilter Specify messages and attachments to be included before a certain date.
afterDateFilter Specify messages and attachments to be included after a certain date.
isCrawlAttachment A Boolean value to choose whether you want to crawl attachments. Messages are automatically crawled.
type The type of data source. Specify GMAIL as your data source type.
shouldCrawlDraftMessages A Boolean value to choose whether you want to crawl draft messages.
syncMode Specify whether Amazon Kendra should update your index by syncing all documents or only new, modified, and deleted documents. You can choose
  • FORCED_FULL_CRAWL to freshly re-crawl all content and replace existing content each time your data source syncs with your index

  • FULL_CRAWL to incrementally crawl only new, modified, and deleted content each time your data source syncs with your index

Important

Because there is no API to update permanently deleted Gmail messages, a New, modified, or deleted content sync:

  • Won’t remove messages that were permanently deleted from Gmail from your Amazon Kendra index

  • Won’t sync changes in Gmail email labels

To sync your Gmail data source label changes and permanently deleted email messages to your Amazon Kendra index, you must run full crawls periodically.

secretARN The Amazon Resource Name (ARN) of a Secrets Manager secret that contains the key-value pairs required to connect to your Gmail. The secret must contain a JSON structure with the following keys:
{ "adminAccountEmailId": "service account email", "clientEmailId": "user account email", "privateKey": "private key" }
version The version of the template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { } }, "repositoryConfigurations": { "type": "object", "properties": { "message": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "attachments": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING"] }, "dataSourceFieldName": { "type": "string" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } } }, "required": [] }, "additionalProperties": { "type": "object", "properties": { "inclusionLabelNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionLabelNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionAttachmentTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionAttachmentTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionAttachmentNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionAttachmentNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionSubjectFilter": { "type": "array", "items": { "type": "string" } }, "exclusionSubjectFilter": { "type": "array", "items": { "type": "string" } }, "isSubjectAnd": { "type": "boolean" }, "inclusionFromFilter": { "type": "array", "items": { "type": "string" } }, "exclusionFromFilter": { "type": "array", "items": { "type": "string" } }, "inclusionToFilter": { "type": "array", "items": { "type": "string" } }, "exclusionToFilter": { "type": "array", "items": { "type": "string" } }, "inclusionCcFilter": { "type": "array", "items": { "type": "string" } }, "exclusionCcFilter": { "type": "array", "items": { "type": "string" } }, "inclusionBccFilter": { "type": "array", "items": { "type": "string" } }, "exclusionBccFilter": { "type": "array", "items": { "type": "string" } }, "beforeDateFilter": { "anyOf": [ { "type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$" }, { "type": "string", "pattern": "" } ] }, "afterDateFilter": { "anyOf": [ { "type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$" }, { "type": "string", "pattern": "" } ] }, "isCrawlAttachment": { "type": "boolean" }, "shouldCrawlDraftMessages": { "type": "boolean" } }, "required": [ "isCrawlAttachment", "shouldCrawlDraftMessages" ] }, "type" : { "type" : "string", "pattern": "GMAIL" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL" ] }, "secretArn": { "type": "string" }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] } }, "required": [ "connectionConfiguration", "repositoryConfigurations", "additionalProperties", "syncMode", "secretArn", "type" ] }

Google Drive template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. Specify the type of data source as GOOGLEDRIVE2, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Google Drive JSON schema.

The following provides information on important JSON keys to configure.

Configuration Description
connectionConfiguration Configuration information for the data source.
repositoryEndpointMetadata The endpoint information for the data source. This data source does not specify an endpoint. You choose your authentication type: serviceAccount and OAuth2. The connection information is included in an AWS Secrets Manager secret that you provide the secretArn.
authType Choose between serviceAccount and OAuth2 based on your use case.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • file

  • comment

A list of objects that map the attributes or field names of your Google Drive to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source
  • maxFileSizeInMegaBytes

Specify a file size limit in MBs that Amazon Kendra should crawl.
  • iscrawlComment

true to index comments in your Google Drive data source.
  • isCrawlMyDriveAndSharedWithMe

true to index MyDrive and Shared With Me Drives in your Google Drive data source.
  • isCrawlSharedDrives

true to index Shared Drives in your Google Drive data source.
  • isCrawlAcl

true to crawl ACL information from your Google Drive data source.
  • excludeUserAccounts

  • excludeSharedDrives

  • excludeMimeTypes

  • exclusionFileTypePatterns

  • exclusionFileNamePatterns

  • exclusionFilePathFilter

A list of regular expression patterns to exclude certain files in your Google Drive data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
  • includeUserAccounts

  • includeSharedDrives

  • includeMimeTypes

  • inclusionFileTypePatterns

  • inclusionFileNamePatterns

  • inclusionFilePathFilter

A list of regular expression patterns to include certain files in your Google Drive data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
type The type of data source. Specify GOOOGLEDRIVEV2 as your data source type.
enableIdentityCrawler true to activate identity crawler. Identity crawler is activated by default. If identity crawler is deactivated, you must upload the principal information using the PutPrincipalMapping API. Crawling identity information on users and groups with access to certain documents is useful for user context filtering. Search results are filtered based on the user or their group access to documents. For more information, see Filtering on user context.
syncMode Specify whether Amazon Kendra should update your index by syncing all documents or only new, modified, and deleted documents. You can choose between:
  • FORCED_FULL_CRAWL to freshly re-crawl all content and replace existing content each time your data source syncs with your index

  • FULL_CRAWL to incrementally crawl only new, modified, and deleted content each time your data source syncs with your index

  • CHANGE_LOG to incrementally crawl only new and modified content each time your data source syncs with your index

secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Google Drive. The secret must contain a JSON structure with the following keys:

If using Google Service Account authentication:

{ "clientEmail": "user account email", "adminAccountEmail": "service account email", "privateKey": "private key" }

If using OAuth 2.0 authentication:

{ "clientID": "OAuth client ID", "clientSecret": "client secret", "refreshToken": "refresh token" }
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "authType": { "type": "string", "enum": [ "serviceAccount", "OAuth2" ] } }, "required": [ "authType" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "file": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "STRING_LIST", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "comment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "STRING_LIST" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "maxFileSizeInMegaBytes": { "type": "string" }, "isCrawlComment": { "type": "boolean" }, "isCrawlMyDriveAndSharedWithMe": { "type": "boolean" }, "isCrawlSharedDrives": { "type": "boolean" }, "isCrawlAcl": { "type": "boolean" }, "excludeUserAccounts": { "type": "array", "items": { "type": "string" } }, "excludeSharedDrives": { "type": "array", "items": { "type": "string" } }, "excludeMimeTypes": { "type": "array", "items": { "type": "string" } }, "includeUserAccounts": { "type": "array", "items": { "type": "string" } }, "includeSharedDrives": { "type": "array", "items": { "type": "string" } }, "includeMimeTypes": { "type": "array", "items": { "type": "string" } }, "includeTargetAudienceGroup": { "type": "array", "items": { "type": "string" } }, "inclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFilePathFilter": { "type": "array", "items": { "type": "string" } }, "exclusionFilePathFilter": { "type": "array", "items": { "type": "string" } } } }, "type": { "type": "string", "pattern": "GOOGLEDRIVEV2" }, "enableIdentityCrawler": { "type": "boolean" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Microsoft Exchange template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the tenant ID as as a part of the connection configuration or repository endpoint details. Also specify the type of data source as MSEXCHANGE, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Microsoft Exchange JSON schema.

The following provides information on important JSON keys to configure.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
tenantId The Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • email

  • attachment

  • calendar

  • contacts

  • notes

A list of objects that map the attributes or field names of your Microsoft Exchange data source to Amazon Kendra index fields. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for content in your data source
inclusionPatterns A list of regular expression patterns to include certain files in your Microsoft Exchange data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
exclusionPatterns A list of regular expression patterns to exclude certain files in your Microsoft Exchange data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
  • inclusionUsersList

  • inclusionUsersFileName

  • inclusionDomainUsers

A list of regular expression patterns to include certain users and user files in your Microsofot Exchange data source. Users that match the patterns are included in the index. Users that don't match the patterns are excluded from the index. If a user matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the user isn't included in the index.
  • exclusionUsersList

  • exclusionUsersFileName

  • exclusionDomainUsers

A list of regular expression patterns to exclude certain users and user files in your Microsoft Exchange data source. Users that match the patterns are excluded from the index. Users that don't match the patterns are included in the index. If a user matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the user isn't included in the index.
s3bucketName The name of your S3 bucket if that you want to use.
  • crawlCalendar

  • crawlNotes

  • crawlFolderAcl

  • crawlContacts

  • crawlFolderAcl

true to index this content in your Microsoft Exchange data source.
startCalendarDateTime You can configure a specific start date-time for your calendar content.
endCalendarDateTime You can configure a specific end date-time for calendar content.
subject You can configure a specific subject line for your mail content.
emailFrom You can configure a specific email for your 'From' or sender mail content.
emailTo You can configure a specific email for your 'To' or recipient mail content.
syncMode Specify whether Amazon Kendra should update your index by syncing all documents or only new, modified, and deleted documents. You can choose between:
  • FORCED_FULL_CRAWL to freshly re-crawl all content and replace existing content each time your data source syncs with your index

  • FULL_CRAWL to incrementally crawl only new, modified, and deleted content each time your data source syncs with your index

  • CHANGE_LOG to incrementally crawl only new and modified content each time your data source syncs with your index

type The type of data source. Specify MSEXCHANGE as your data source type.
secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Microsoft Exchange. This includes your client ID and your client secret that is generated when you create an OAuth application in the Azure portal.
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "tenantId": { "type": "string", "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", "minLength": 36, "maxLength": 36 } }, "required": ["tenantId"] } } }, "repositoryConfigurations": { "type": "object", "properties": { "email": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "attachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "DATE","LONG"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "calendar": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "contacts": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "notes": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } }, "required": ["email" ] }, "additionalProperties": { "type": "object", "properties": { "inclusionPatterns": { "type": "array", "items": { "type": "string" } }, "exclusionPatterns": { "type": "array", "items": { "type": "string" } }, "inclusionUsersList": { "type": "array", "items": { "type": "string", "format": "email" } }, "exclusionUsersList": { "type": "array", "items": { "type": "string", "format": "email" } }, "s3bucketName": { "type": "string" }, "inclusionUsersFileName": { "type": "string" }, "exclusionUsersFileName": { "type": "string" }, "inclusionDomainUsers": { "type": "array", "items": { "type": "string" } }, "exclusionDomainUsers": { "type": "array", "items": { "type": "string" } }, "crawlCalendar": { "type": "boolean" }, "crawlNotes": { "type": "boolean" }, "crawlContacts": { "type": "boolean" }, "crawlFolderAcl": { "type": "boolean" }, "startCalendarDateTime": { "anyOf": [ { "type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$" }, { "type": "string", "pattern": "" } ] }, "endCalendarDateTime": { "anyOf": [ { "type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$" }, { "type": "string", "pattern": "" } ] }, "subject": { "type": "array", "items": { "type": "string" } }, "emailFrom": { "type": "array", "items": { "type": "string", "format": "email" } }, "emailTo": { "type": "array", "items": { "type": "string", "format": "email" } } }, "required": [ ] }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "type" : { "type" : "string", "pattern": "MSEXCHANGE" }, "secretArn": { "type": "string" } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Microsoft OneDrive template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the tenant ID as part of the connection configuration or repository endpoint details. Also specify the type of data source as ONEDRIVEV2, and a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Microsoft OneDrive JSON schema.

The following provides information on important JSON keys to configure.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
tenantId The Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
file A list of objects that map the attributes or field names of your Microsoft OneDrive files to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source
  • userNameFilter

  • userFilterPath

  • inclusionFileTypePatterns

  • exclusionFileTypePatterns

  • inclusionFileNamePatterns

  • exclusionFileNamePatterns

  • inclusionFilePathPatterns

  • exclusionFilePathPatterns

  • inclusionOneNoteSectionNamePatterns

  • exclusionOneNoteSectionNamePatterns

  • inclusionOneNotePageNamePatterns

  • exclusionOneNotepageNamePatterns

You can choose to index specific files, OneNote sections, OneNote pages, and filter by user name.
isUserNameOnS3 true to provide a list of user names in a file stored in an Amazon S3.
type The type of data source. Specify ONEDRIVEV2 as your data source type.
enableIdentityCrawler true to activate identity crawler. Identity crawler is activated by default. If identity crawler is deactivated, you must upload the principal information using the PutPrincipalMapping API. Crawling identity information on users and groups with access to certain documents is useful for user context filtering. Search results are filtered based on the user or their group access to documents. For more information, see Filtering on user context.
type The type of data source. Specify ONEDRIVEV2 as your data source type.
syncMode Specify whether Amazon Kendra should update your index by syncing all documents or only new, modified, and deleted documents. You can choose between:
  • FORCED_FULL_CRAWL to freshly re-crawl all content and replace existing content each time your data source syncs with your index

  • FULL_CRAWL to incrementally crawl only new, modified, and deleted content each time your data source syncs with your index

  • CHANGE_LOG to incrementally crawl only new and modified content each time your data source syncs with your index

secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Microsoft OneDrive. The secret must contain a JSON structure with the following keys:
{ "clientId": "client ID", "clientSecret": "client secret" }
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "tenantId": { "type": "string", "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", "minLength": 36, "maxLength": 36 } }, "required": [ "tenantId" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "file": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "userNameFilter": { "type": "array", "items": { "type": "string" } }, "userFilterPath": { "type": "string" }, "isUserNameOnS3": { "type": "boolean" }, "inclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFilePathPatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFilePathPatterns": { "type": "array", "items": { "type": "string" } }, "inclusionOneNoteSectionNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionOneNoteSectionNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionOneNotePageNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionOneNotePageNamePatterns": { "type": "array", "items": { "type": "string" } } }, "required": [] }, "enableIdentityCrawler": { "type": "boolean" }, "type": { "type": "string", "pattern": "ONEDRIVEV2" }, "syncMode": { "type": "string", "enum": [ "FULL_CRAWL", "FORCED_FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Microsoft SharePoint template schema

You include a JSON that contains the data source schema as part of TemplateConfiguration object. You provide the SharePoint host URL, domain, and also a tenant ID if required as a part of the connection configuration or repository endpoint details. Also specify the type of data source as SHAREPOINTV2, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See SharePoint JSON schema.

The following provides information on important JSON keys to configure.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source
repositoryEndpointMetadata The endpoint information for the data source
tenantId The tenant id of your SharePoint account.
domain The domain of your SharePoint account.
siteUrls The host URLs of your SharePoint account.
repositoryAdditionalProperties Additional properties to connect with the repository/data source endpoint.
s3bucketName The name of the Amazon S3 bucket that stores your Azure AD self-signed X.509 certificate.
s3certificateName The name of the Azure AD self-signed X.509 certificate stored in your Amazon S3 bucket.
authType The type of authentication you are using, whether OAuth2, OAuth2Certificate, OAuth2App, Basic.
version The SharePoint version you are using, whether Server or Online.
onPremVersion The SharePoint Server version you are using, whether 2013, 2016 2019, or SubscriptionEdition.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
additionalProperties Additional configuration options for your content in your data source.
  • eventTitleFilterRegEx

  • pageTitleFilterRegEx

  • linkTitleFilterRegEx

  • inclusionFilePath

  • exclusionFilePath

  • inclusionFileTypePatterns

  • exclusionFileTypePatterns

  • inclusionFileNamePatterns

  • exclusionFileNamePatterns

  • inclusionOneNoteSectionNamePatterns

  • exclusionOneNoteSectionNamePatterns

  • inclusionOneNotePageNamePatterns

  • exclusionOneNotePageNamePatterns

  • aclConfiguration

  • emailDomain

  • proxyHost

  • proxyPort

A list of regular expression patterns to include/exclude certain files in your Sharepoint data source. Files that match the patterns are included in the index. File that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
  • crawlFiles

  • crawlPages

  • crawlEvents

  • crawlComments

  • crawlLinks

  • crawlAttachments

  • crawlListData

  • crawlAcl

  • isCrawlLocalGroupMapping

  • isCrawlAdGroupMapping

Input TRUE to index.
type Specify SHAREPOINTV2 as your data source type
enableIdentityCrawler true to activate identity crawler. Identity crawler is activated by default. If identity crawler is deactivated, you must upload the principal information using the PutPrincipalMapping API. Crawling identity information on users and groups with access to certain documents is useful for user context filtering. Search results are filtered based on the user or their group access to documents. For more information, see Filtering on user context.
syncMode Specify whether Amazon Kendra should update your index by syncing all documents or only new, modified, and deleted documents. You can choose between:
  • FORCED_FULL_CRAWL to freshly re-crawl all content and replace existing content each time your data source syncs with your index

  • FULL_CRAWL to incrementally crawl only new, modified, and deleted content each time your data source syncs with your index

  • CHANGE_LOG to incrementally crawl only new and modified content each time your data source syncs with your index

secretARN The Amazon Resource Name (ARN) of a AWS Secrets Manager secret that contains the key-value pairs required to connect to your SharePoint. For information on these key-value pairs, see Connection instructions for SharePoint Online and SharePoint Server.
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "tenantId": { "type": "string", "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", "minLength": 36, "maxLength": 36 }, "domain": { "type": "string" }, "siteUrls": { "type": "array", "items": { "type": "string", "pattern": "https://.*" } }, "repositoryAdditionalProperties": { "type": "object", "properties": { "s3bucketName": { "type": "string" }, "s3certificateName": { "type": "string" }, "authType": { "type": "string", "enum": [ "OAuth2", "OAuth2Certificate", "OAuth2App", "Basic", "NTLM", "Kerberos" ] }, "version": { "type": "string", "enum": [ "Server", "Online" ] }, "onPremVersion": { "type": "string", "enum": [ "", "2013", "2016", "2019", "SubscriptionEdition" ] } }, "required": [ "authType", "version" ] } }, "required": [ "siteUrls", "domain", "repositoryAdditionalProperties" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "event": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "page": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "file": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "link": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "attachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "comment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "eventTitleFilterRegEx": { "type": "array", "items": { "type": "string" } }, "pageTitleFilterRegEx": { "type": "array", "items": { "type": "string" } }, "linkTitleFilterRegEx": { "type": "array", "items": { "type": "string" } }, "inclusionFilePath": { "type": "array", "items": { "type": "string" } }, "exclusionFilePath": { "type": "array", "items": { "type": "string" } }, "inclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionOneNoteSectionNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionOneNoteSectionNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionOneNotePageNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionOneNotePageNamePatterns": { "type": "array", "items": { "type": "string" } }, "crawlFiles": { "type": "boolean" }, "crawlPages": { "type": "boolean" }, "crawlEvents": { "type": "boolean" }, "crawlComments": { "type": "boolean" }, "crawlLinks": { "type": "boolean" }, "crawlAttachments": { "type": "boolean" }, "crawlListData": { "type": "boolean" }, "crawlAcl": { "type": "boolean" }, "aclConfiguration": { "type": "string", "enum": [ "ACLWithLDAPEmailFmt", "ACLWithManualEmailFmt", "ACLWithUsernameFmt" ] }, "emailDomain": { "type": "string" }, "isCrawlLocalGroupMapping": { "type": "boolean" }, "isCrawlAdGroupMapping": { "type": "boolean" }, "proxyHost": { "type": "string" }, "proxyPort": { "type": "string" } }, "required": [ ] }, "type": { "type": "string", "pattern": "SHAREPOINTV2" }, "enableIdentityCrawler": { "type": "boolean" }, "syncMode": { "type": "string", "enum": [ "FULL_CRAWL", "FORCED_FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "enableIdentityCrawler", "syncMode", "additionalProperties", "secretArn", "type" ] }

Microsoft Teams template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the tenant ID as a part of the connection configuration or repository endpoint details. Also specify the type of data source as MSTEAMS, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Microsoft Teams JSON schema.

The following provides information on important JSON keys to configure.

Configuration Description
connectionConfiguration Configuration information for endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
tenantId The Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • chatMessage

  • chatAttachment

  • channelPost

  • channelWiki

  • channelAttachment

  • meetingChat

  • meetingFile

  • meetingNote

  • calendarMeeting

A list of objects that map the attributes or field names of your Microsoft Teams content to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source.
paymentModel Specifies what type of payment model to use with your Microsoft Teams data source. Model A payment models are restricted to licensing and payment models that require security compliance. Model B payment models are suitable for licensing and payment models that do not require security compliance.
  • inclusionTeamNameFilter

  • inclusionChannelNameFilter

  • inclusionFileNamePatterns

  • inclusionFileTypePatterns

  • inclusionUserEmailFilter

  • inclusionOneNoteSectionNamePatterns

  • inclusionOneNotePageNamePatterns

A list of regular expression patterns to include certain content in your Microsoft Teams data source. Content that matches the patterns are included in the index. Content that doesn't match the patterns are excluded from the index. If content matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index.
  • exclusionTeamNameFilter

  • exclusionChannelNameFilter

  • exclusionFileNamePatterns

  • exclusionFileTypePatterns

  • exclusionUserEmailFilter

  • exclusionOneNoteSectionNamePatterns

  • exclusionOneNotePageNamePatterns

A list of regular expression patterns to exclude certain content in your Microsoft Teams data source. Content that matches the patterns are excluded from the index. Content that doesn't match the patterns are included in the index. If content matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index.
  • isCrawlChatMessage

  • isCrawlChatAttachment

  • isCrawlChannelPost

  • isCrawlChannelAttachment

  • isCrawlChannelWiki

  • isCrawlCalendarMeeting

  • isCrawlMeetingChat

  • isCrawlMeetingFile

  • isCrawlMeetingNote

true to index this content in your Microsoft Teams data source.
startCalendarDateTime You can configure a specific start date-time for your calendar content.
endCalendarDateTime You can configure a specific end date-time for calendar content.
type The type of data source. Specify MSTEAMS as your data source type.
enableIdentityCrawler true to activate identity crawler. Identity crawler is activated by default. If identity crawler is deactivated, you must upload the principal information using the PutPrincipalMapping API. Crawling identity information on users and groups with access to certain documents is useful for user context filtering. Search results are filtered based on the user or their group access to documents. For more information, see Filtering on user context.
syncMode Specify whether Amazon Kendra should update your index by syncing all documents or only new, modified, and deleted documents. You can choose between:
  • FORCED_FULL_CRAWL to freshly re-crawl all content and replace existing content each time your data source syncs with your index

  • FULL_CRAWL to incrementally crawl only new, modified, and deleted content each time your data source syncs with your index

  • CHANGE_LOG to incrementally crawl only new and modified content each time your data source syncs with your index

secretArn The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Microsoft Teams. This includes your client ID and client secret that is generated when you create an OAuth application in the Azure portal.
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "tenantId": { "type": "string", "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", "minLength": 36, "maxLength": 36 } }, "required": [ "tenantId" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "chatMessage": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "chatAttachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "channelPost": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "channelWiki": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "channelAttachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "meetingChat": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "meetingFile": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "meetingNote": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "calendarMeeting": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "paymentModel": { "type": "string", "enum": [ "A", "B", "Evaluation Mode" ] }, "inclusionTeamNameFilter": { "type": "array", "items": { "type": "string" } }, "exclusionTeamNameFilter": { "type": "array", "items": { "type": "string" } }, "inclusionChannelNameFilter": { "type": "array", "items": { "type": "string" } }, "exclusionChannelNameFilter": { "type": "array", "items": { "type": "string" } }, "inclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionUserEmailFilter": { "type": "array", "items": { "type": "string" } }, "inclusionOneNoteSectionNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionOneNoteSectionNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionOneNotePageNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionOneNotePageNamePatterns": { "type": "array", "items": { "type": "string" } }, "isCrawlChatMessage": { "type": "boolean" }, "isCrawlChatAttachment": { "type": "boolean" }, "isCrawlChannelPost": { "type": "boolean" }, "isCrawlChannelAttachment": { "type": "boolean" }, "isCrawlChannelWiki": { "type": "boolean" }, "isCrawlCalendarMeeting": { "type": "boolean" }, "isCrawlMeetingChat": { "type": "boolean" }, "isCrawlMeetingFile": { "type": "boolean" }, "isCrawlMeetingNote": { "type": "boolean" }, "startCalendarDateTime": { "anyOf": [ { "type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$" }, { "type": "string", "pattern": "" } ] }, "endCalendarDateTime": { "anyOf": [ { "type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$" }, { "type": "string", "pattern": "" } ] } }, "required": [] }, "type": { "type": "string", "pattern": "MSTEAMS" }, "enableIdentityCrawler": { "type": "boolean" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Microsoft Yammer template schema

You include a JSON that contains the data source schema as part of TemplateConfiguration object. Specify the type of data source as YAMMER, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide.

The following provides information on important JSON keys to configure.

Configuration Description
connectionConfiguration Configuration information for the data source.
repositoryEndpointMetadata The endpoint information for the data source. This data source does not specify an endpoint in repositoryEndpointMetadata. Rather, the connection information is included in an AWS Secrets Manager secret that you provide the secretArn.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • community

  • user

  • message

  • attachment

A list of objects that map attributes or field names of Microsoft Yammer content to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source
inclusionPatterns A list of regular expression patterns to include certain files in your Microsoft Yammer data source. Files that match the patterns are included in the index. File that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
exclusionPatterns A list of regular expression patterns to exclude certain files in your Microsoft Yammer data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
sinceDate You can choose to configure a sinceDate parameter so that the Microsoft Yammer connector crawls content based on a specific sinceDate.
communityNameFilter You can choose to index specific community content.
  • isCrawlMessage

  • isCrawlAttachment

  • isCrawlPrivateMessage

true to index messages, message attachments, and private messages.
type Specify YAMMER as your data source type.
secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Microsoft Yammer. This includes your Microsoft Yammer user name and password, and client ID and client secret that is generated when you create an OAuth application in the Azure portal.
useChangeLog true to use the Microsoft Yammer change log to determine which documents require adding, updating, or deleting in the index.
syncMode Specify whether Amazon Kendra should update your index by syncing all documents or only new, modified, and deleted documents. You can choose between:
  • FORCED_FULL_CRAWL to freshly re-crawl all content and replace existing content each time your data source syncs with your index

  • FULL_CRAWL to incrementally crawl only new, modified, and deleted content each time your data source syncs with your index

  • CHANGE_LOG to incrementally crawl only new and modified content each time your data source syncs with your index

enableIdentityCrawler true to activate identity crawler. Identity crawler is activated by default. If identity crawler is deactivated, you must upload the principal information using the PutPrincipalMapping API. Crawling identity information on users and groups with access to certain documents is useful for user context filtering. Search results are filtered based on the user or their group access to documents. For more information, see Filtering on user context.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { } } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "community": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "user": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "message": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "attachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "inclusionPatterns": { "type": "array" }, "exclusionPatterns": { "type": "array" }, "sinceDate": { "type": "string", "pattern": "^(19|2[0-9])[0-9]{2}-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01])T(0[0-9]|1[0-9]|2[0-3]):([0-5][0-9]):([0-5][0-9])((\\+|-)(0[0-9]|1[0-9]|2[0-3]):([0-5][0-9]))?$" }, "communityNameFilter": { "type": "array", "items": { "type": "string" } }, "isCrawlMessage": { "type": "boolean" }, "isCrawlAttachment": { "type": "boolean" }, "isCrawlPrivateMessage": { "type": "boolean" } }, "required": [ "sinceDate" ] }, "type": { "type": "string", "pattern": "YAMMER" }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 }, "useChangeLog": { "type": "string", "enum": [ "true", "false" ] }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "enableIdentityCrawler": { "type": "boolean" }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] } }, "required": [ "connectionConfiguration", "repositoryConfigurations", "additionalProperties", "type", "secretArn", "syncMode" ] }

Salesforce template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the Salesforce host URL as a part of the connection configuration or repository endpoint details. Also specify the type of data source as SALESFORCEV2, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Salesforce JSON schema.

The following provides information on important JSON keys to configure.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
hostUrl The URL of the Salesforce instance to be indexed.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • account

  • contact

  • campaign

  • case

  • product

  • lead

  • contract

  • partner

  • profile

  • idea

  • pricebook

  • task

  • solution

  • attachment

  • user

  • document

  • knowledgeArticles

  • group

  • opportunity

  • chatter

  • customEntity

A list of objects that map the attributes or field names of your Salesforce entities to Amazon Kendra index field names. For more information, see Mapping data source fields.
secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Salesforce. The secret must contain a JSON structure with the following keys:
{ "authenticationUrl": "OAUTH endpoint that Amazon Kendra connects to get an OAUTH token", "consumerKey": "Application public key generated when you created your Salesforce application", "consumerSecret": "Application private key generated when you created your Salesforce application", "password": "Password associated with the user logging in to the Salesforce instance", "securityToken": "Token associated with the user account logging in to the Salesforce instance", "username": "User name of the user logging in to the Salesforce instance" }
additionalProperties Additional configuration options for your content in your data source
  • accountFilter

  • contactFilter

  • caseFilter

  • campaignFilter

  • contractFilter

  • groupFilter

  • leadFilter

  • productFilter

  • opportunityFilter

  • partnerFilter

  • pricebookFilter

  • ideaFilter

  • profileFilter

  • taskFilter

  • solutionFilter

  • userFilter

  • chatterFilter

  • documentFilter

  • knowledgeArticleFilter

  • customEntities

A collection of strings that specifies which entities to filter.

inclusionPatterns

  • inclusionDocumentFileTypePatterns

  • inclusionDocumentFileNamePatterns

  • inclusionAccountFileTypePatterns

  • inclusionCampaignFileTypePatterns

  • inclusionDocumentFileNamePatterns

  • inclusionCampaignFileNamePatterns

  • inclusionCaseFileTypePatterns

  • inclusionCaseFileNamePatterns

  • inclusionContactFileTypePatterns

  • inclusionContractFileNamePatterns

  • inclusionLeadFileTypePatterns

  • inclusionLeadFileNamePatterns

  • inclusionOpportunityFileTypePatterns

  • inclusionOpportunityFileNamePatterns

  • inclusionSolutionFileTypePatterns

  • inclusionSolutionFileNamePatterns

  • inclusionTaskFileTypePatterns

  • inclusionTaskFileNamePatterns

  • inclusionGroupFileTypePatterns

  • inclusionGroupFileNamePatterns

  • inclusionChatterFileTypePatterns

  • inclusionChatterFileNamePatterns

  • inclusionCustomEntityFileTypePatterns

  • inclusionCustomEntityFileNamePatterns

A list of regular expression patterns to include certain files in your Salesforce data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.

exclusionPatterns

  • exclusionDocumentFileTypePatterns

  • exclusionDocumentFileNamePatterns

  • exclusionAccountFileTypePatterns

  • exclusionCampaignFileTypePatterns

  • exclusionCampaignFileNamePatterns

  • exclusionCaseFileTypePatterns

  • exclusionCaseFileNamePatterns

  • exclusionContactFileTypePatterns

  • exclusionContractFileNamePatterns

  • exclusionLeadFileTypePatterns

  • exclusionLeadFileNamePatterns

  • exclusionOpportunityFileTypePatterns

  • exclusionOpportunityFileNamePatterns

  • exclusionSolutionFileTypePatterns

  • exclusionSolutionFileNamePatterns

  • exclusionTaskFileTypePatterns

  • exclusionTaskFileNamePatterns

  • exclusionGroupFileTypePatterns

  • exclusionGroupFileNamePatterns

  • exclusionChatterFileTypePatterns

  • exclusionChatterFileNamePatterns

  • exclusionCustomEntityFileTypePatterns

  • exclusionCustomEntityFileNamePatterns

A list of regular expression patterns to exclude certain files in your Salesforce data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
  • isCrawlAccount

  • isCrawlContact

  • isCrawlCase

  • isCrawlCampaign

  • isCrawlProduct

  • isCrawlLead

  • isCrawlContract

  • isCrawlPartner

  • isCrawlProfile

  • isCrawlIdea

  • isCrawlPricebook

  • isCrawlDocument

  • crawlSharedDocument

  • isCrawlGroup

  • isCrawlOpportunity

  • isCrawlChatter

  • isCrawlUser

  • isCrawlSolution

  • isCrawlTask

  • isCrawlAccountAttachments

  • isCrawlContactAttachments

  • isCrawlCaseAttachments

  • isCrawlCampaignAttachments

  • isCrawlLeadAttachments

  • isCrawlContractAttachments

  • isCrawlGroupAttachments

  • isCrawlOpportunityAttachments

  • isCrawlChatterAttachments

  • isCrawlSolutionAttachments

  • isCrawlTaskAttachments

  • isCrawlCustomEntityAttachments

  • isCrawlKnowledgeArticles

    • isCrawlDraft

    • isCrawlPublish

    • isCrawlArchived

true to index corresponding files in your Salesforce account.
type The type of data source. Specify SALESFORCEV2 as your data source type.
enableIdentityCrawler true to activate identity crawler. Identity crawler is activated by default. If identity crawler is deactivated, you must upload the principal information using the PutPrincipalMapping API. Crawling identity information on users and groups with access to certain documents is useful for user context filtering. Search results are filtered based on the user or their group access to documents. For more information, see Filtering on user context.
syncMode Specify whether Amazon Kendra should update your index by syncing all documents or only new, modified, and deleted documents. You can choose between:
  • FORCED_FULL_CRAWL to freshly re-crawl all content and replace existing content each time your data source syncs with your index

  • FULL_CRAWL to incrementally crawl only new, modified, and deleted content each time your data source syncs with your index

  • CHANGE_LOG to incrementally crawl only new and modified content each time your data source syncs with your index

version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "hostUrl": { "type": "string", "pattern": "https:.*" } }, "required": [ "hostUrl" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "account": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "contact": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "campaign": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "case": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "product": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "lead": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "contract": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "partner": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "profile": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "idea": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "pricebook": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "task": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "solution": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "attachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "user": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "document": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "knowledgeArticles": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "group": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "opportunity": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "chatter": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "customEntity": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "accountFilter":{ "type": "array", "items": { "type": "string" } }, "contactFilter":{ "type": "array", "items": { "type": "string" } }, "caseFilter":{ "type": "array", "items": { "type": "string" } }, "campaignFilter":{ "type": "array", "items": { "type": "string" } }, "contractFilter":{ "type": "array", "items": { "type": "string" } }, "groupFilter":{ "type": "array", "items": { "type": "string" } }, "leadFilter":{ "type": "array", "items": { "type": "string" } }, "productFilter":{ "type": "array", "items": { "type": "string" } }, "opportunityFilter":{ "type": "array", "items": { "type": "string" } }, "partnerFilter":{ "type": "array", "items": { "type": "string" } }, "pricebookFilter":{ "type": "array", "items": { "type": "string" } }, "ideaFilter":{ "type": "array", "items": { "type": "string" } }, "profileFilter":{ "type": "array", "items": { "type": "string" } }, "taskFilter":{ "type": "array", "items": { "type": "string" } }, "solutionFilter":{ "type": "array", "items": { "type": "string" } }, "userFilter":{ "type": "array", "items": { "type": "string" } }, "chatterFilter":{ "type": "array", "items": { "type": "string" } }, "documentFilter":{ "type": "array", "items": { "type": "string" } }, "knowledgeArticleFilter":{ "type": "array", "items": { "type": "string" } }, "customEntities":{ "type": "array", "items": { "type": "string" } }, "isCrawlAccount": { "type": "boolean" }, "isCrawlContact": { "type": "boolean" }, "isCrawlCase": { "type": "boolean" }, "isCrawlCampaign": { "type": "boolean" }, "isCrawlProduct": { "type": "boolean" }, "isCrawlLead": { "type": "boolean" }, "isCrawlContract": { "type": "boolean" }, "isCrawlPartner": { "type": "boolean" }, "isCrawlProfile": { "type": "boolean" }, "isCrawlIdea": { "type": "boolean" }, "isCrawlPricebook": { "type": "boolean" }, "isCrawlDocument": { "type": "boolean" }, "crawlSharedDocument": { "type": "boolean" }, "isCrawlGroup": { "type": "boolean" }, "isCrawlOpportunity": { "type": "boolean" }, "isCrawlChatter": { "type": "boolean" }, "isCrawlUser": { "type": "boolean" }, "isCrawlSolution":{ "type": "boolean" }, "isCrawlTask":{ "type": "boolean" }, "isCrawlAccountAttachments": { "type": "boolean" }, "isCrawlContactAttachments": { "type": "boolean" }, "isCrawlCaseAttachments": { "type": "boolean" }, "isCrawlCampaignAttachments": { "type": "boolean" }, "isCrawlLeadAttachments": { "type": "boolean" }, "isCrawlContractAttachments": { "type": "boolean" }, "isCrawlGroupAttachments": { "type": "boolean" }, "isCrawlOpportunityAttachments": { "type": "boolean" }, "isCrawlChatterAttachments": { "type": "boolean" }, "isCrawlSolutionAttachments":{ "type": "boolean" }, "isCrawlTaskAttachments":{ "type": "boolean" }, "isCrawlCustomEntityAttachments":{ "type": "boolean" }, "isCrawlKnowledgeArticles": { "type": "object", "properties": { "isCrawlDraft": { "type": "boolean" }, "isCrawlPublish": { "type": "boolean" }, "isCrawlArchived": { "type": "boolean" } } }, "inclusionDocumentFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionDocumentFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionDocumentFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionDocumentFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionAccountFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionAccountFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionAccountFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionAccountFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionCampaignFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionCampaignFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionCampaignFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionCampaignFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionCaseFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionCaseFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionCaseFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionCaseFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionContactFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionContactFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionContactFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionContactFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionContractFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionContractFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionContractFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionContractFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionLeadFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionLeadFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionLeadFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionLeadFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionOpportunityFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionOpportunityFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionOpportunityFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionOpportunityFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionSolutionFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionSolutionFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionSolutionFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionSolutionFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionTaskFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionTaskFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionTaskFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionTaskFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionGroupFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionGroupFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionGroupFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionGroupFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionChatterFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionChatterFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionChatterFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionChatterFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionCustomEntityFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionCustomEntityFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionCustomEntityFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionCustomEntityFileNamePatterns":{ "type": "array", "items": { "type": "string" } } }, "required": [] }, "enableIdentityCrawler": { "type": "boolean" }, "type": { "type": "string", "pattern": "SALESFORCEV2" }, "syncMode": { "type": "string", "enum": [ "FULL_CRAWL", "FORCED_FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] >>>>>>> mainline }

ServiceNow template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the ServiceNow host URL, authentication type, and instance version as a part of the connection configuration or repository endpoint details. Also specify the type of data source as SERVICENOWV2, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See ServiceNow JSON schema.

The following provides information on important JSON keys to configure.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
hostUrl The ServiceNow host URL. For example, your-domain.service-now.com.
authType The type of authentication you are using, whether basicAuth or OAuth2.
servicenowInstanceVersion The ServiceNow version you are using. You can choose between Tokyo, Sandiego, Rome, and Others.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • knowledgeArticle

  • attachment

  • serviceCatalog

  • incident

A list of objects that map the attributes or field names of your ServiceNow knowledge articles, attachments, service catalog, and incidents to Amazon Kendra index field names. For more information, see Mapping data source fields. The ServiceNow data source field names must exist in your ServiceNow custom metadata.
additional properties Additional configuration options for your content in your data source.
  • knowledgeArticleFilter

  • incidentQueryFilter

  • serviceCatalogQueryFilter

  • knowledgeArticleTitleRegExp

  • serviceCatalogTitleRegExp

  • incidentTitleRegExp

  • inclusionFileTypePatterns

  • exclusionFileTypePatterns

  • inclusionFileNamePatterns

  • exclusionFileNamePatterns

  • incidentStateType

A list of regular expression patterns to include and/or exclude certain files in your ServiceNow data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
  • isCrawlKnowledgeArticle

  • isCrawlKnowledgeArticleAttachment

  • includePublicArticlesOnly

  • isCrawlServiceCatalog

  • isCrawlServiceCatalogAttachment

  • isCrawlActiveServiceCatalog

  • isCrawlInactiveServiceCatalog

  • isCrawlIncident

  • isCrawlIncidentAttachment

  • isCrawlActiveIncident

  • isCrawlInactiveIncident

  • applyACLForKnowledgeArticle

  • applyACLForServiceCatalog

  • applyACLForIncident

true to index ServiceNow knowledge articles, service catalogs, incidents, and attachments.
type The type of data source. Specify SERVICENOWV2 as your data source type.
enableIdentityCrawler true to activate identity crawler. Identity crawler is activated by default. If identity crawler is deactivated, you must upload the principal information using the PutPrincipalMapping API. Crawling identity information on users and groups with access to certain documents is useful for user context filtering. Search results are filtered based on the user or their group access to documents. For more information, see Filtering on user context.
syncMode Specify whether Amazon Kendra should update your index by syncing all documents or only new, modified, and deleted documents. You can choose between:
  • FORCED_FULL_CRAWL to freshly re-crawl all content and replace existing content each time your data source syncs with your index

  • FULL_CRAWL to incrementally crawl only new, modified, and deleted content each time your data source syncs with your index

  • CHANGE_LOG to incrementally crawl only new and modified content each time your data source syncs with your index

secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your ServiceNow. The secret must contain a JSON structure with the following keys:
{ "username": "user name", "password": "password" }
If you use OAuth2 authentication, your secret must contain a JSON structure with the following keys:
{ "username": "user name", "password": "password", "clientId": "client id", "clientSecret": "client secret" }
version The version of the template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "hostUrl": { "type": "string", "pattern": "^(?!(^(https?|ftp|file):\/\/))[a-z0-9-]+.service-now.com$", "minLength": 1, "maxLength": 2048 }, "authType": { "type": "string", "enum": [ "basicAuth", "OAuth2" ] }, "servicenowInstanceVersion": { "type": "string", "enum": [ "Tokyo", "Sandiego", "Rome", "Others" ] } }, "required": [ "hostUrl", "authType", "servicenowInstanceVersion" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "knowledgeArticle": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "STRING_LIST" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "attachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "LONG", "DATE", "STRING_LIST" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "serviceCatalog": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "STRING_LIST" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "incident": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "STRING_LIST" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "isCrawlKnowledgeArticle": { "type": "boolean" }, "isCrawlKnowledgeArticleAttachment": { "type": "boolean" }, "includePublicArticlesOnly": { "type": "boolean" }, "knowledgeArticleFilter": { "type": "string" }, "incidentQueryFilter": { "type": "string" }, "serviceCatalogQueryFilter": { "type": "string" }, "isCrawlServiceCatalog": { "type": "boolean" }, "isCrawlServiceCatalogAttachment": { "type": "boolean" }, "isCrawlActiveServiceCatalog": { "type": "boolean" }, "isCrawlInactiveServiceCatalog": { "type": "boolean" }, "isCrawlIncident": { "type": "boolean" }, "isCrawlIncidentAttachment": { "type": "boolean" }, "isCrawlActiveIncident": { "type": "boolean" }, "isCrawlInactiveIncident": { "type": "boolean" }, "applyACLForKnowledgeArticle": { "type": "boolean" }, "applyACLForServiceCatalog": { "type": "boolean" }, "applyACLForIncident": { "type": "boolean" }, "incidentStateType": { "type": "array", "items": { "type": "string", "enum": [ "Open", "Open - Unassigned", "Resolved", "All" ] } }, "knowledgeArticleTitleRegExp": { "type": "string" }, "serviceCatalogTitleRegExp": { "type": "string" }, "incidentTitleRegExp": { "type": "string" }, "inclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } } }, "required": [] }, "type": { "type": "string", "pattern": "SERVICENOWV2" }, "enableIdentityCrawler": { "type": "boolean" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL" ] }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Zendesk template schema

You include a JSON that contains the data source schema as part of TemplateConfiguration object. You provide the host URL as a part of the connection configuration or repository endpoint details. Also specify the type of data source as ZENDESK, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Zendesk JSON schema.

The following provides information on important JSON keys to configure.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
hostURL The Zendesk host URL. For example, https://yoursubdomain.zendesk.com.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • ticket

  • ticketComment

  • ticketCommentAttachment

  • article

  • articleComment

  • articleAttachment

  • communityTopic

  • communityPostComment

A list of objects that map attributes or field names of Zendesk tickets to Amazon Kendra index field names. For more information, see Mapping data source fields.
secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Zendesk. The secret must contain a JSON structure with the following keys: host URL, client ID, client secret, user name, and password.
additionalProperties Additional configuration options for your content in your data source
organizationNameFilter You can choose to index tickets that exist within a specific Organization.
sinceDate You can choose to configure a sinceDate parameter so that the Zendesk connector crawls content based on a specific sinceDate.
inclusionPatterns A list of regular expression patterns to include certain files in your Zendesk data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
exclusionPatterns A list of regular expression patterns to exclude certain files in your Zendesk data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
  • isCrawlTicket

  • isCrawlTicketComment

  • isCrawlTicketCommentAttachment

  • isCrawlArticle

  • isCrawlArticleComment

  • isCrawlArticleAttachment

  • isCrawlCommunityTopic

  • isCrawlCommunityPost

  • isCrawlCommunityPostComment

Input true to index the content.
type Specify ZENDESK as your data source type.
useChangeLog Input trueto use the Zendesk change log to determine which documents require updating in the index. Depending on the change log's size, it might be faster to scan the documents in Zendesk. If you are syncing your Zendesk data source with your index for the first time, all documents are scanned.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "hostUrl": { "type": "string", "pattern": "https:.*" } }, "required": [ "hostUrl" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "ticket": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "LONG", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "ticketComment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "LONG", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "ticketCommentAttachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "LONG", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "article": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "LONG", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "communityPostComment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "LONG", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "articleComment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "LONG", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "articleAttachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "LONG", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "communityTopic": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "LONG", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] } } }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 }, "additionalProperties": { "type": "object", "properties": { "organizationNameFilter": { "type": "array" }, "sinceDate": { "type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}$" }, "inclusionPatterns": { "type": "array" }, "exclusionPatterns": { "type": "array" }, "isCrawTicket": { "type": "string" }, "isCrawTicketComment": { "type": "string" }, "isCrawTicketCommentAttachment": { "type": "string" }, "isCrawlArticle": { "type": "string" }, "isCrawlArticleAttachment": { "type": "string" }, "isCrawlArticleComment": { "type": "string" }, "isCrawlCommunityTopic": { "type": "string" }, "isCrawlCommunityPost": { "type": "string" }, "isCrawlCommunityPostComment": { "type": "string" } } }, "type": { "type": "string", "pattern": "ZENDESK" }, "useChangeLog": { "type": "string", "enum": ["true", "false"] } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "additionalProperties": false, "required": [ "connectionConfiguration", "repositoryConfigurations", "additionalProperties", "useChangeLog", "secretArn", "type" ] }