Data source template schemas - Amazon Kendra

Data source template schemas

The following are template schemas for data sources where templates are supported.

Adobe Experience Manager template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the Adobe Experience Manager host URL, the authentication type, and whether you use Adobe Experience Manager (AEM) as a Cloud Service or AEM On-Premise as part of the connection configuration or repository endpoint details. Also, specify the type of data source as AEM, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. For more information, see Adobe Experience Manager JSON schema.

The following table describes the parameters of the AEM JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
aemUrl The Adobe Experience Manager host URL. For example, if you use AEM On-Premise, you include the hostname and port: https://hostname:port. Or, if you use AEM as a Cloud Service, you can use the author URL: https://author-xxxxxx-xxxxxxx.adobeaemcloud.com.
authType The type of authentication you use, whether Basic or OAuth2.
deploymentType The type of Adobe Experience Manager that you use, either CLOUD or ON_PREMISE.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • page

  • asset

A list of objects that map the attributes or field names of your Adobe Experience Manager pages and assets to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source.
timeZoneId

If you use AEM On-Premise and the time zone of your server is different than the time zone of the Amazon Kendra AEM connector or index, you can specify the server time zone to align with the AEM connector or index.

The default time zone for AEM On-Premise is the time zone of the Amazon Kendra AEM connector or index. The default time zone for AEM as a Cloud Service is Greenwich Mean Time.

  • pageRootPaths

  • assetRootPaths

A list of root paths for pages and assets. For example, the root path for a page could be /content/sub and the root path for an asset could be /content/sub/asset1.
crawlAssets true to crawl assets.
crawlPages true to crawl pages.
  • pagePathInclusionPatterns

  • pageNameInclusionPatterns

  • assetPathInclusionPatterns

  • assetTypeInclusionPatterns

  • assetNameInclusionPatterns

A list of regular expression patterns to include certain pages and assets in your Adobe Experience Manager data source. Pages and assets that match the patterns are included in the index. Pages and assets that don't match the patterns are excluded from the index. If a page or asset matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index.
  • pagePathExclusionPatterns

  • pageNameExclusionPatterns

  • assetPathExclusionPatterns

  • assetTypeInclusionPatterns

  • assetNameInclusionPatterns

A list of regular expression patterns to exclude certain pages and assets in your Adobe Experience Manager data source. Pages and assets that match the patterns are excluded from the index. Pages and assets that don't match the patterns are included in the index. If a page or asset matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index.
pageComponents A list of names for the specific page components that you want to index.
contentFragmentVariations A list of names for the specific saved variations of Adobe Experience Manager Content Fragments that you want to index.
type The type of data source. Specify AEM as your data source type.
enableIdentityCrawler true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the PutPrincipalMapping API to upload user and group access information.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretArn The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Adobe Experience Manager. For information on these key-value pairs, see Connection instructions for Adobe Experience Manager.
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "aemUrl": { "type": "string", "pattern": "https:.*" }, "authType": { "type": "string", "enum": ["Basic", "OAuth2"] }, "deploymentType": { "type": "string", "enum": ["CLOUD","ON_PREMISE"] } }, "required": [ "aemUrl", "authType", "deploymentType" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "page": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "asset": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "timeZoneId": { "type": "string", "enum": [ "Africa/Abidjan", "Africa/Accra", "Africa/Addis_Ababa", "Africa/Algiers", "Africa/Asmara", "Africa/Asmera", "Africa/Bamako", "Africa/Bangui", "Africa/Banjul", "Africa/Bissau", "Africa/Blantyre", "Africa/Brazzaville", "Africa/Bujumbura", "Africa/Cairo", "Africa/Casablanca", "Africa/Ceuta", "Africa/Conakry", "Africa/Dakar", "Africa/Dar_es_Salaam", "Africa/Djibouti", "Africa/Douala", "Africa/El_Aaiun", "Africa/Freetown", "Africa/Gaborone", "Africa/Harare", "Africa/Johannesburg", "Africa/Juba", "Africa/Kampala", "Africa/Khartoum", "Africa/Kigali", "Africa/Kinshasa", "Africa/Lagos", "Africa/Libreville", "Africa/Lome", "Africa/Luanda", "Africa/Lubumbashi", "Africa/Lusaka", "Africa/Malabo", "Africa/Maputo", "Africa/Maseru", "Africa/Mbabane", "Africa/Mogadishu", "Africa/Monrovia", "Africa/Nairobi", "Africa/Ndjamena", "Africa/Niamey", "Africa/Nouakchott", "Africa/Ouagadougou", "Africa/Porto-Novo", "Africa/Sao_Tome", "Africa/Timbuktu", "Africa/Tripoli", "Africa/Tunis", "Africa/Windhoek", "America/Adak", "America/Anchorage", "America/Anguilla", "America/Antigua", "America/Araguaina", "America/Argentina/Buenos_Aires", "America/Argentina/Catamarca", "America/Argentina/ComodRivadavia", "America/Argentina/Cordoba", "America/Argentina/Jujuy", "America/Argentina/La_Rioja", "America/Argentina/Mendoza", "America/Argentina/Rio_Gallegos", "America/Argentina/Salta", "America/Argentina/San_Juan", "America/Argentina/San_Luis", "America/Argentina/Tucuman", "America/Argentina/Ushuaia", "America/Aruba", "America/Asuncion", "America/Atikokan", "America/Atka", "America/Bahia", "America/Bahia_Banderas", "America/Barbados", "America/Belem", "America/Belize", "America/Blanc-Sablon", "America/Boa_Vista", "America/Bogota", "America/Boise", "America/Buenos_Aires", "America/Cambridge_Bay", "America/Campo_Grande", "America/Cancun", "America/Caracas", "America/Catamarca", "America/Cayenne", "America/Cayman", "America/Chicago", "America/Chihuahua", "America/Ciudad_Juarez", "America/Coral_Harbour", "America/Cordoba", "America/Costa_Rica", "America/Creston", "America/Cuiaba", "America/Curacao", "America/Danmarkshavn", "America/Dawson", "America/Dawson_Creek", "America/Denver", "America/Detroit", "America/Dominica", "America/Edmonton", "America/Eirunepe", "America/El_Salvador", "America/Ensenada", "America/Fort_Nelson", "America/Fort_Wayne", "America/Fortaleza", "America/Glace_Bay", "America/Godthab", "America/Goose_Bay", "America/Grand_Turk", "America/Grenada", "America/Guadeloupe", "America/Guatemala", "America/Guayaquil", "America/Guyana", "America/Halifax", "America/Havana", "America/Hermosillo", "America/Indiana/Indianapolis", "America/Indiana/Knox", "America/Indiana/Marengo", "America/Indiana/Petersburg", "America/Indiana/Tell_City", "America/Indiana/Vevay", "America/Indiana/Vincennes", "America/Indiana/Winamac", "America/Indianapolis", "America/Inuvik", "America/Iqaluit", "America/Jamaica", "America/Jujuy", "America/Juneau", "America/Kentucky/Louisville", "America/Kentucky/Monticello", "America/Knox_IN", "America/Kralendijk", "America/La_Paz", "America/Lima", "America/Los_Angeles", "America/Louisville", "America/Lower_Princes", "America/Maceio", "America/Managua", "America/Manaus", "America/Marigot", "America/Martinique", "America/Matamoros", "America/Mazatlan", "America/Mendoza", "America/Menominee", "America/Merida", "America/Metlakatla", "America/Mexico_City", "America/Miquelon", "America/Moncton", "America/Monterrey", "America/Montevideo", "America/Montreal", "America/Montserrat", "America/Nassau", "America/New_York", "America/Nipigon", "America/Nome", "America/Noronha", "America/North_Dakota/Beulah", "America/North_Dakota/Center", "America/North_Dakota/New_Salem", "America/Nuuk", "America/Ojinaga", "America/Panama", "America/Pangnirtung", "America/Paramaribo", "America/Phoenix", "America/Port-au-Prince", "America/Port_of_Spain", "America/Porto_Acre", "America/Porto_Velho", "America/Puerto_Rico", "America/Punta_Arenas", "America/Rainy_River", "America/Rankin_Inlet", "America/Recife", "America/Regina", "America/Resolute", "America/Rio_Branco", "America/Rosario", "America/Santa_Isabel", "America/Santarem", "America/Santiago", "America/Santo_Domingo", "America/Sao_Paulo", "America/Scoresbysund", "America/Shiprock", "America/Sitka", "America/St_Barthelemy", "America/St_Johns", "America/St_Kitts", "America/St_Lucia", "America/St_Thomas", "America/St_Vincent", "America/Swift_Current", "America/Tegucigalpa", "America/Thule", "America/Thunder_Bay", "America/Tijuana", "America/Toronto", "America/Tortola", "America/Vancouver", "America/Virgin", "America/Whitehorse", "America/Winnipeg", "America/Yakutat", "America/Yellowknife", "Antarctica/Casey", "Antarctica/Davis", "Antarctica/DumontDUrville", "Antarctica/Macquarie", "Antarctica/Mawson", "Antarctica/McMurdo", "Antarctica/Palmer", "Antarctica/Rothera", "Antarctica/South_Pole", "Antarctica/Syowa", "Antarctica/Troll", "Antarctica/Vostok", "Arctic/Longyearbyen", "Asia/Aden", "Asia/Almaty", "Asia/Amman", "Asia/Anadyr", "Asia/Aqtau", "Asia/Aqtobe", "Asia/Ashgabat", "Asia/Ashkhabad", "Asia/Atyrau", "Asia/Baghdad", "Asia/Bahrain", "Asia/Baku", "Asia/Bangkok", "Asia/Barnaul", "Asia/Beirut", "Asia/Bishkek", "Asia/Brunei", "Asia/Calcutta", "Asia/Chita", "Asia/Choibalsan", "Asia/Chongqing", "Asia/Chungking", "Asia/Colombo", "Asia/Dacca", "Asia/Damascus", "Asia/Dhaka", "Asia/Dili", "Asia/Dubai", "Asia/Dushanbe", "Asia/Famagusta", "Asia/Gaza", "Asia/Harbin", "Asia/Hebron", "Asia/Ho_Chi_Minh", "Asia/Hong_Kong", "Asia/Hovd", "Asia/Irkutsk", "Asia/Istanbul", "Asia/Jakarta", "Asia/Jayapura", "Asia/Jerusalem", "Asia/Kabul", "Asia/Kamchatka", "Asia/Karachi", "Asia/Kashgar", "Asia/Kathmandu", "Asia/Katmandu", "Asia/Khandyga", "Asia/Kolkata", "Asia/Krasnoyarsk", "Asia/Kuala_Lumpur", "Asia/Kuching", "Asia/Kuwait", "Asia/Macao", "Asia/Macau", "Asia/Magadan", "Asia/Makassar", "Asia/Manila", "Asia/Muscat", "Asia/Nicosia", "Asia/Novokuznetsk", "Asia/Novosibirsk", "Asia/Omsk", "Asia/Oral", "Asia/Phnom_Penh", "Asia/Pontianak", "Asia/Pyongyang", "Asia/Qatar", "Asia/Qostanay", "Asia/Qyzylorda", "Asia/Rangoon", "Asia/Riyadh", "Asia/Saigon", "Asia/Sakhalin", "Asia/Samarkand", "Asia/Seoul", "Asia/Shanghai", "Asia/Singapore", "Asia/Srednekolymsk", "Asia/Taipei", "Asia/Tashkent", "Asia/Tbilisi", "Asia/Tehran", "Asia/Tel_Aviv", "Asia/Thimbu", "Asia/Thimphu", "Asia/Tokyo", "Asia/Tomsk", "Asia/Ujung_Pandang", "Asia/Ulaanbaatar", "Asia/Ulan_Bator", "Asia/Urumqi", "Asia/Ust-Nera", "Asia/Vientiane", "Asia/Vladivostok", "Asia/Yakutsk", "Asia/Yangon", "Asia/Yekaterinburg", "Asia/Yerevan", "Atlantic/Azores", "Atlantic/Bermuda", "Atlantic/Canary", "Atlantic/Cape_Verde", "Atlantic/Faeroe", "Atlantic/Faroe", "Atlantic/Jan_Mayen", "Atlantic/Madeira", "Atlantic/Reykjavik", "Atlantic/South_Georgia", "Atlantic/St_Helena", "Atlantic/Stanley", "Australia/ACT", "Australia/Adelaide", "Australia/Brisbane", "Australia/Broken_Hill", "Australia/Canberra", "Australia/Currie", "Australia/Darwin", "Australia/Eucla", "Australia/Hobart", "Australia/LHI", "Australia/Lindeman", "Australia/Lord_Howe", "Australia/Melbourne", "Australia/NSW", "Australia/North", "Australia/Perth", "Australia/Queensland", "Australia/South", "Australia/Sydney", "Australia/Tasmania", "Australia/Victoria", "Australia/West", "Australia/Yancowinna", "Brazil/Acre", "Brazil/DeNoronha", "Brazil/East", "Brazil/West", "CET", "CST6CDT", "Canada/Atlantic", "Canada/Central", "Canada/Eastern", "Canada/Mountain", "Canada/Newfoundland", "Canada/Pacific", "Canada/Saskatchewan", "Canada/Yukon", "Chile/Continental", "Chile/EasterIsland", "Cuba", "EET", "EST5EDT", "Egypt", "Eire", "Etc/GMT", "Etc/GMT+0", "Etc/GMT+1", "Etc/GMT+10", "Etc/GMT+11", "Etc/GMT+12", "Etc/GMT+2", "Etc/GMT+3", "Etc/GMT+4", "Etc/GMT+5", "Etc/GMT+6", "Etc/GMT+7", "Etc/GMT+8", "Etc/GMT+9", "Etc/GMT-0", "Etc/GMT-1", "Etc/GMT-10", "Etc/GMT-11", "Etc/GMT-12", "Etc/GMT-13", "Etc/GMT-14", "Etc/GMT-2", "Etc/GMT-3", "Etc/GMT-4", "Etc/GMT-5", "Etc/GMT-6", "Etc/GMT-7", "Etc/GMT-8", "Etc/GMT-9", "Etc/GMT0", "Etc/Greenwich", "Etc/UCT", "Etc/UTC", "Etc/Universal", "Etc/Zulu", "Europe/Amsterdam", "Europe/Andorra", "Europe/Astrakhan", "Europe/Athens", "Europe/Belfast", "Europe/Belgrade", "Europe/Berlin", "Europe/Bratislava", "Europe/Brussels", "Europe/Bucharest", "Europe/Budapest", "Europe/Busingen", "Europe/Chisinau", "Europe/Copenhagen", "Europe/Dublin", "Europe/Gibraltar", "Europe/Guernsey", "Europe/Helsinki", "Europe/Isle_of_Man", "Europe/Istanbul", "Europe/Jersey", "Europe/Kaliningrad", "Europe/Kiev", "Europe/Kirov", "Europe/Kyiv", "Europe/Lisbon", "Europe/Ljubljana", "Europe/London", "Europe/Luxembourg", "Europe/Madrid", "Europe/Malta", "Europe/Mariehamn", "Europe/Minsk", "Europe/Monaco", "Europe/Moscow", "Europe/Nicosia", "Europe/Oslo", "Europe/Paris", "Europe/Podgorica", "Europe/Prague", "Europe/Riga", "Europe/Rome", "Europe/Samara", "Europe/San_Marino", "Europe/Sarajevo", "Europe/Saratov", "Europe/Simferopol", "Europe/Skopje", "Europe/Sofia", "Europe/Stockholm", "Europe/Tallinn", "Europe/Tirane", "Europe/Tiraspol", "Europe/Ulyanovsk", "Europe/Uzhgorod", "Europe/Vaduz", "Europe/Vatican", "Europe/Vienna", "Europe/Vilnius", "Europe/Volgograd", "Europe/Warsaw", "Europe/Zagreb", "Europe/Zaporozhye", "Europe/Zurich", "GB", "GB-Eire", "GMT", "GMT0", "Greenwich", "Hongkong", "Iceland", "Indian/Antananarivo", "Indian/Chagos", "Indian/Christmas", "Indian/Cocos", "Indian/Comoro", "Indian/Kerguelen", "Indian/Mahe", "Indian/Maldives", "Indian/Mauritius", "Indian/Mayotte", "Indian/Reunion", "Iran", "Israel", "Jamaica", "Japan", "Kwajalein", "Libya", "MET", "MST7MDT", "Mexico/BajaNorte", "Mexico/BajaSur", "Mexico/General", "NZ", "NZ-CHAT", "Navajo", "PRC", "PST8PDT", "Pacific/Apia", "Pacific/Auckland", "Pacific/Bougainville", "Pacific/Chatham", "Pacific/Chuuk", "Pacific/Easter", "Pacific/Efate", "Pacific/Enderbury", "Pacific/Fakaofo", "Pacific/Fiji", "Pacific/Funafuti", "Pacific/Galapagos", "Pacific/Gambier", "Pacific/Guadalcanal", "Pacific/Guam", "Pacific/Honolulu", "Pacific/Johnston", "Pacific/Kanton", "Pacific/Kiritimati", "Pacific/Kosrae", "Pacific/Kwajalein", "Pacific/Majuro", "Pacific/Marquesas", "Pacific/Midway", "Pacific/Nauru", "Pacific/Niue", "Pacific/Norfolk", "Pacific/Noumea", "Pacific/Pago_Pago", "Pacific/Palau", "Pacific/Pitcairn", "Pacific/Pohnpei", "Pacific/Ponape", "Pacific/Port_Moresby", "Pacific/Rarotonga", "Pacific/Saipan", "Pacific/Samoa", "Pacific/Tahiti", "Pacific/Tarawa", "Pacific/Tongatapu", "Pacific/Truk", "Pacific/Wake", "Pacific/Wallis", "Pacific/Yap", "Poland", "Portugal", "ROK", "Singapore", "SystemV/AST4", "SystemV/AST4ADT", "SystemV/CST6", "SystemV/CST6CDT", "SystemV/EST5", "SystemV/EST5EDT", "SystemV/HST10", "SystemV/MST7", "SystemV/MST7MDT", "SystemV/PST8", "SystemV/PST8PDT", "SystemV/YST9", "SystemV/YST9YDT", "Turkey", "UCT", "US/Alaska", "US/Aleutian", "US/Arizona", "US/Central", "US/East-Indiana", "US/Eastern", "US/Hawaii", "US/Indiana-Starke", "US/Michigan", "US/Mountain", "US/Pacific", "US/Samoa", "UTC", "Universal", "W-SU", "WET", "Zulu", "EST", "HST", "MST", "ACT", "AET", "AGT", "ART", "AST", "BET", "BST", "CAT", "CNT", "CST", "CTT", "EAT", "ECT", "IET", "IST", "JST", "MIT", "NET", "NST", "PLT", "PNT", "PRT", "PST", "SST", "VST" ] }, "pageRootPaths": { "type": "array", "items": { "type": "string" } }, "assetRootPaths": { "type": "array", "items": { "type": "string" } }, "crawlAssets": { "type": "boolean" }, "crawlPages": { "type": "boolean" }, "pagePathInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "pagePathExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "pageNameInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "pageNameExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "assetPathInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "assetPathExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "assetTypeInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "assetTypeExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "assetNameInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "assetNameExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "pageComponents": { "type": "array", "items": { "type": "object" } }, "contentFragmentVariations": { "type": "array", "items": { "type": "object" } }, "cugExemptedPrincipals": { "type": "array", "items": { "type": "string" } } }, "required": [] }, "type": { "type": "string", "pattern": "AEM" }, "enableIdentityCrawler": { "type": "boolean" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Amazon FSx (Windows) template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the file system ID as part of the connection configuration or repository endpoint details. You must also specify the type of data source as FSX, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Amazon FSx (Windows) JSON schema.

The following table describes the parameters of the Amazon FSx (Windows) JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
fileSystemId The identifier of the Amazon FSx file system. You can find your file system ID on the File Systems dashboard in the Amazon FSx console.
fileSystemType The Amazon FSx file system type. To use Windows File Server as your type of file system, specify WINDOWS.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
All A list of objects that map attributes or field names of your files in your Amazon FSx data source to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source.
isCrawlAcl true to crawl the access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see User context filtering.
inclusionPatterns A list of regular expression patterns to include certain files in your Amazon FSx data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
exclusionPatterns A list of regular expression patterns to exclude certain files in your Amazon FSx data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
enableIdentityCrawler true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the PutPrincipalMapping API to upload user and group access information.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

type The type of data source. For Windows file system data sources, specify FSX.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "fileSystemId": { "type": "string", "pattern": "fs-.*" }, "fileSystemType": { "type": "string", "pattern": "WINDOWS" } }, "required": ["fileSystemId", "fileSystemType"] } } }, "repositoryConfigurations": { "type": "object", "properties": { "All": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": ["fieldMappings"] } }, "required": ["All"] }, "additionalProperties": { "type": "object", "properties": { "isCrawlAcl": { "type": "boolean" }, "exclusionPatterns": { "type": "array", "items": { "type": "string" } }, "inclusionPatterns": { "type": "array", "items": { "type": "string" } } }, "required": [] }, "enableIdentityCrawler": { "type": "boolean" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL" ] }, "type" : { "type" : "string", "pattern": "FSX" } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "enableIdentityCrawler", "additionalProperties", "type" ] }

Amazon FSx (NetApp ONTAP) template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the file system ID and the storage virtual machine (SVM) as part of the connection configuration or repository endpoint details. You must also specify the type of data source as FSXONTAP, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Amazon FSx (NetApp ONTAP) JSON schema.

The following table describes the parameters of the Amazon FSx (NetApp ONTAP) JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
fileSystemId The identifier of the Amazon FSx file system. You can find your file system ID on the File Systems dashboard in the Amazon FSx console. For information about how to create a file system in the Amazon FSx console for NetApp ONTAP, see Getting Started Guide for NetApp ONTAP in the FSx for ONTAP User Guide.
fileSystemType The Amazon FSx file system type. To use NetApp ONTAP as your type of file system, specify ONTAP.
svmId The identifier of storage virtual machine (SVM) used with your Amazon FSx file system for NetApp ONTAP. You can find your SVM ID by going to the File Systems dashboard in the Amazon FSx console, selecting your file system ID, and then selecting Storage virtual machines. For information about how to create a file system in the Amazon FSx console for NetApp ONTAP, see Getting Started Guide for NetApp ONTAP in the FSx for ONTAP User Guide.
protocolType Whether you use the Common Internet File System (CIFS) protocol for Windows, or the Network File System (NFS) protocol for Linux.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
file A list of objects that map attributes or field names of your files in your Amazon FSx data source to Amazon Kendra index field names. For more information, see Mapping data source fields. The data source field names must exist in your files custom metadata.
additionalProperties Additional configuration options for your content in your data source.
crawlAcl true to crawl the access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see User context filtering.
inclusionPatterns A list of regular expression patterns to include certain files in your Amazon FSx data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
exclusionPatterns A list of regular expression patterns to exclude certain files in your Amazon FSx data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
type The type of data source. For NetApp ONTAP file system data sources, specify FSXONTAP.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretArn

The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Amazon FSx file system. The secret must contain a JSON structure with the following keys:

{ "username": "user@corp.example.com", "password": "password" }

If you use the NFS protocol for your Amazon FSx file system, the secret is stored in a JSON structure with the following keys:

{ "leftId": "left ID", "rightId": "right ID", "preSharedKey": "pre-shared key" }
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "fileSystemId": { "type": "string", "pattern": "^(fs-[0-9a-f]{8,21})$" }, "fileSystemType": { "type": "string", "enum": ["ONTAP"] }, "svmId": { "type": "string", "pattern": "^(svm-[0-9a-f]{17,21})$" }, "protocolType": { "type": "string", "enum": [ "CIFS", "NFS" ] } }, "required": [ "fileSystemId", "fileSystemType" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "file": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string", "pattern": "^([a-zA-Z_]{1,20})$" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string", "pattern": "^([a-zA-Z_]{1,20})$" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ], "maxItems": 50 } }, "required": [ "fieldMappings" ] } }, "required": [ "file" ] }, "additionalProperties": { "type": "object", "properties": { "crawlAcl": { "type": "boolean" }, "inclusionPatterns": { "type": "array", "items": { "type": "string", "maxLength": 30 }, "maxItems": 100 }, "exclusionPatterns": { "type": "array", "items": { "type": "string", "maxLength": 30 }, "maxItems": 100 } } }, "type": { "type": "string", "pattern": "FSXONTAP" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL" ] }, "secretArn": { "type": "string", "pattern": "arn:aws:secretsmanager:.*" } }, "required": [ "connectionConfiguration", "repositoryConfigurations", "additionalProperties", "secretArn", "type" ] }

Alfresco template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the Alfresco site ID, repository URL, user interface URL, authentication type, whether you use cloud or on-premises, and the type of content you want to crawl. You provide this as a part of the connection configuration or repository endpoint details. Also specify the type of data source as ALFRESCO, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Alfresco JSON schema.

The following table describes the parameters of the Alfresco JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
siteId The identifier of the Alfresco site.
repoUrl The URL of your Alfresco repository. You can get the repository URL from your Alfresco administrator. For example, if you use Alfresco Cloud (PaaS), the repository URL could be https://company.alfrescocloud.com. Or, if you use Alfresco On-Premises, the repository URL could be https://company-alfresco-instance.company-domain.suffix:port.
webAppUrl The URL of your Alfresco user interface. You can get the Alfresco user interface URL from your Alfresco administrator. For example, the user interface URL could be https://example.com.
repositoryAdditionalProperties Additional properties to connect with the repository/data source endpoint.
authType The type of authentication that you use, whether OAuth2 or Basic.
type (deployment) The type of Alfresco that you use, whether PAAS or ON-PREM.
crawlType The type of content that you want to crawl, whether ASPECT (content marked with 'Aspects' in Alfresco), SITE_ID (content within a specific Alfresco site), or ALL_SITES (content across all your Alfresco sites).
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • document

  • comment

A list of objects that map the attributes or field names of your Alfresco documents and comments to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source.
aspectName

The name of a specific 'Aspect' that you want to index.

aspectProperties

A list of specific 'Aspect' content properties that you want to index.

enableFineGrainedControl

true to crawl 'Aspects'.

isCrawlComment

true to crawl comments.

  • inclusionFileNamePatterns

  • inclusionFileTypePatterns

  • inclusionFilePathPatterns

A list of regular expression patterns to include certain files in your Alfresco data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
  • exclusionFileNamePatterns

  • exclusionFileTypePatterns

  • exclusionFilePathPatterns

A list of regular expression patterns to exclude certain files in your Alfresco data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
type The type of data source. Specify ALFRESCO as your data source type.
secretArn

The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs that are required to connect to your Alfresco. The secret must contain a JSON structure with the following keys:

If using basic authentication:

{ "username": "user name", "password": "password" }

If using OAuth 2.0 authentication:

{ "clientId": "client ID", "clientSecret": "client secret", "tokenUrl": "token URL" }
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

enableIdentityCrawler true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the PutPrincipalMapping API to upload user and group access information.
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "siteId": { "type": "string" }, "repoUrl": { "type": "string" }, "webAppUrl": { "type": "string" }, "repositoryAdditionalProperties": { "type": "object", "properties": { "authType": { "type": "string", "enum": [ "OAuth2", "Basic" ] }, "type": { "type": "string", "enum": [ "PAAS", "ON_PREM" ] }, "crawlType": { "type": "string", "enum": [ "ASPECT", "SITE_ID", "ALL_SITES" ] } } } } } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "document": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "STRING_LIST", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "comment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "STRING_LIST", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "aspectName": { "type": "string" }, "aspectProperties": { "type": "array" }, "enableFineGrainedControl": { "type": "boolean" }, "isCrawlComment": { "type": "boolean" }, "inclusionFileNamePatterns": { "type": "array" }, "exclusionFileNamePatterns": { "type": "array" }, "inclusionFileTypePatterns": { "type": "array" }, "exclusionFileTypePatterns": { "type": "array" }, "inclusionFilePathPatterns": { "type": "array" }, "exclusionFilePathPatterns": { "type": "array" } } }, "type": { "type": "string", "pattern": "ALFRESCO" }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL" ] }, "enableIdentityCrawler": { "type": "boolean" }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] } }, "required": [ "connectionConfiguration", "repositoryConfigurations", "additionalProperties", "type", "secretArn" ] }

Aurora (MySQL) template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. Specify the type of data source as JDBC, the database type as mysql, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Aurora (MySQL) JSON schema.

The following table describes the parameters of the Aurora (MySQL) JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata Required configuration information for connecting your data source.
  • dbType—The type of Java database that you use, whether mysql, db2, postgresql, oracle, or sqlserver.

  • dbHost—The database host name.

  • dbPort—The database port.

  • dbInstance—The database instance.

repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN.

document

A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source.
primaryKey Provide the primary key for the database table. This identifies a table within your database.
titleColumn Provide the name of the document title column within your database table.
bodyColumn Provide the name of the document title column within your database table.
sqlQuery Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
timestampColumn Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content.
timestampFormat Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.
timezone Enter the name of the column which contains time zones for the content to be crawled.
changeDetectingColumns Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns
allowedUsersColumns Enter the name of the column which contains User IDs to be allowed access to content.
allowedGroupsColumn Enter the name of the column which contains User IDs to be allowed access to content.
sourceURIColumn Enter the name of the column which contains Source URLs to be indexed.
isSslEnabled Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
type The type of data source. Specify JDBC as your data source type.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretArn The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys:
{ "user name": "database user name", "password": "password" }
version The version of the template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "dbType": { "type": "string", "enum": [ "mysql", "db2", "postgresql", "oracle", "sqlserver" ] }, "dbHost": { "type": "string" }, "dbPort": { "type": "string" }, "dbInstance": { "type": "string" } }, "required": [ "dbType", "dbHost", "dbPort", "dbInstance" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "document": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string" }, "dataSourceFieldName": { "type": "string" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } }, "required": [ ] }, "additionalProperties": { "type": "object", "properties": { "primaryKey": { "type": "string" }, "titleColumn": { "type": "string" }, "bodyColumn": { "type": "string" }, "sqlQuery": { "type": "string", "not": { "pattern": ";+" } }, "timestampColumn": { "type": "string" }, "timestampFormat": { "type": "string" }, "timezone": { "type": "string" }, "changeDetectingColumns": { "type": "array", "items": { "type": "string" } }, "allowedUsersColumn": { "type": "string" }, "allowedGroupsColumn": { "type": "string" }, "sourceURIColumn": { "type": "string" }, "isSslEnabled": { "type": "boolean" } }, "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"] }, "type" : { "type" : "string", "pattern": "JDBC" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string" } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Aurora (PostgreSQL) template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. Specify the type of data source as JDBC, the database type as postgresql, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Aurora (PostgreSQL) JSON schema.

The following table describes the parameters of the Aurora (PostgreSQL) JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata Required configuration information for connecting your data source.
  • dbType—The type of Java database that you use, whether mysql, db2, postgresql, oracle, or sqlserver.

  • dbHost—The database host name.

  • dbPort—The database port.

  • dbInstance—The database instance.

repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN.

document

A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source.
primaryKey Provide the primary key for the database table. This identifies a table within your database.
titleColumn Provide the name of the document title column within your database table.
bodyColumn Provide the name of the document title column within your database table.
sqlQuery Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
timestampColumn Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content.
timestampFormat Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.
timezone Enter the name of the column which contains time zones for the content to be crawled.
changeDetectingColumns Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns
allowedUsersColumns Enter the name of the column which contains User IDs to be allowed access to content.
allowedGroupsColumn Enter the name of the column which contains User IDs to be allowed access to content.
sourceURIColumn Enter the name of the column which contains Source URLs to be indexed.
isSslEnabled Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
type The type of data source. Specify JDBC as your data source type.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretArn The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys:
{ "user name": "database user name", "password": "password" }
version The version of the template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "dbType": { "type": "string", "enum": [ "mysql", "db2", "postgresql", "oracle", "sqlserver" ] }, "dbHost": { "type": "string" }, "dbPort": { "type": "string" }, "dbInstance": { "type": "string" } }, "required": [ "dbType", "dbHost", "dbPort", "dbInstance" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "document": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string" }, "dataSourceFieldName": { "type": "string" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } }, "required": [ ] }, "additionalProperties": { "type": "object", "properties": { "primaryKey": { "type": "string" }, "titleColumn": { "type": "string" }, "bodyColumn": { "type": "string" }, "sqlQuery": { "type": "string", "not": { "pattern": ";+" } }, "timestampColumn": { "type": "string" }, "timestampFormat": { "type": "string" }, "timezone": { "type": "string" }, "changeDetectingColumns": { "type": "array", "items": { "type": "string" } }, "allowedUsersColumn": { "type": "string" }, "allowedGroupsColumn": { "type": "string" }, "sourceURIColumn": { "type": "string" }, "isSslEnabled": { "type": "boolean" } }, "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"] }, "type" : { "type" : "string", "pattern": "JDBC" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string" } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Amazon RDS (Microsoft SQL Server) template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. Specify the type of data source as JDBC, the database type as sqlserver, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Amazon RDS (Microsoft SQL Server) JSON schema.

The following table describes the parameters of the Amazon RDS (Microsoft SQL Server) JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata Required configuration information for connecting your data source.
  • dbType—The type of Java database that you use, whether mysql, db2, postgresql, oracle, or sqlserver.

  • dbHost—The database host name.

  • dbPort—The database port.

  • dbInstance—The database instance.

repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN.

document

A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source.
primaryKey Provide the primary key for the database table. This identifies a table within your database.
titleColumn Provide the name of the document title column within your database table.
bodyColumn Provide the name of the document title column within your database table.
sqlQuery Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
timestampColumn Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content.
timestampFormat Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.
timezone Enter the name of the column which contains time zones for the content to be crawled.
changeDetectingColumns Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns
allowedUsersColumns Enter the name of the column which contains User IDs to be allowed access to content.
allowedGroupsColumn Enter the name of the column which contains User IDs to be allowed access to content.
sourceURIColumn Enter the name of the column which contains Source URLs to be indexed.
isSslEnabled Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
type The type of data source. Specify JDBC as your data source type.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretArn The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys:
{ "user name": "database user name", "password": "password" }
version The version of the template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "dbType": { "type": "string", "enum": [ "mysql", "db2", "postgresql", "oracle", "sqlserver" ] }, "dbHost": { "type": "string" }, "dbPort": { "type": "string" }, "dbInstance": { "type": "string" } }, "required": [ "dbType", "dbHost", "dbPort", "dbInstance" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "document": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string" }, "dataSourceFieldName": { "type": "string" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } }, "required": [ ] }, "additionalProperties": { "type": "object", "properties": { "primaryKey": { "type": "string" }, "titleColumn": { "type": "string" }, "bodyColumn": { "type": "string" }, "sqlQuery": { "type": "string", "not": { "pattern": ";+" } }, "timestampColumn": { "type": "string" }, "timestampFormat": { "type": "string" }, "timezone": { "type": "string" }, "changeDetectingColumns": { "type": "array", "items": { "type": "string" } }, "allowedUsersColumn": { "type": "string" }, "allowedGroupsColumn": { "type": "string" }, "sourceURIColumn": { "type": "string" }, "isSslEnabled": { "type": "boolean" } }, "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"] }, "type" : { "type" : "string", "pattern": "JDBC" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string" } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Amazon RDS (MySQL) template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. Specify the type of data source as JDBC, the database type as mysql, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Amazon RDS (MySQL) JSON schema.

The following table describes the parameters of the Amazon RDS (MySQL) JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata Required configuration information for connecting your data source.
  • dbType—The type of Java database that you use, whether mysql, db2, postgresql, oracle, or sqlserver.

  • dbHost—The database host name.

  • dbPort—The database port.

  • dbInstance—The database instance.

repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN.

document

A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source.
primaryKey Provide the primary key for the database table. This identifies a table within your database.
titleColumn Provide the name of the document title column within your database table.
bodyColumn Provide the name of the document title column within your database table.
sqlQuery Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
timestampColumn Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content.
timestampFormat Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.
timezone Enter the name of the column which contains time zones for the content to be crawled.
changeDetectingColumns Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns
allowedUsersColumns Enter the name of the column which contains User IDs to be allowed access to content.
allowedGroupsColumn Enter the name of the column which contains User IDs to be allowed access to content.
sourceURIColumn Enter the name of the column which contains Source URLs to be indexed.
isSslEnabled Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
type The type of data source. Specify JDBC as your data source type.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretArn The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys:
{ "user name": "database user name", "password": "password" }
version The version of the template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "dbType": { "type": "string", "enum": [ "mysql", "db2", "postgresql", "oracle", "sqlserver" ] }, "dbHost": { "type": "string" }, "dbPort": { "type": "string" }, "dbInstance": { "type": "string" } }, "required": [ "dbType", "dbHost", "dbPort", "dbInstance" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "document": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string" }, "dataSourceFieldName": { "type": "string" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } }, "required": [ ] }, "additionalProperties": { "type": "object", "properties": { "primaryKey": { "type": "string" }, "titleColumn": { "type": "string" }, "bodyColumn": { "type": "string" }, "sqlQuery": { "type": "string", "not": { "pattern": ";+" } }, "timestampColumn": { "type": "string" }, "timestampFormat": { "type": "string" }, "timezone": { "type": "string" }, "changeDetectingColumns": { "type": "array", "items": { "type": "string" } }, "allowedUsersColumn": { "type": "string" }, "allowedGroupsColumn": { "type": "string" }, "sourceURIColumn": { "type": "string" }, "isSslEnabled": { "type": "boolean" } }, "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"] }, "type" : { "type" : "string", "pattern": "JDBC" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string" } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Amazon RDS (Oracle) template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. Specify the type of data source as JDBC, the database type as oracle, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Amazon RDS (Oracle) JSON schema.

The following table describes the parameters of the Amazon RDS (Oracle) JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata Required configuration information for connecting your data source.
  • dbType—The type of Java database that you use, whether mysql, db2, postgresql, oracle, or sqlserver.

  • dbHost—The database host name.

  • dbPort—The database port.

  • dbInstance—The database instance.

repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN.

document

A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source.
primaryKey Provide the primary key for the database table. This identifies a table within your database.
titleColumn Provide the name of the document title column within your database table.
bodyColumn Provide the name of the document title column within your database table.
sqlQuery Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
timestampColumn Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content.
timestampFormat Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.
timezone Enter the name of the column which contains time zones for the content to be crawled.
changeDetectingColumns Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns
allowedUsersColumns Enter the name of the column which contains User IDs to be allowed access to content.
allowedGroupsColumn Enter the name of the column which contains User IDs to be allowed access to content.
sourceURIColumn Enter the name of the column which contains Source URLs to be indexed.
isSslEnabled Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
type The type of data source. Specify JDBC as your data source type.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretArn The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys:
{ "user name": "database user name", "password": "password" }
version The version of the template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "dbType": { "type": "string", "enum": [ "mysql", "db2", "postgresql", "oracle", "sqlserver" ] }, "dbHost": { "type": "string" }, "dbPort": { "type": "string" }, "dbInstance": { "type": "string" } }, "required": [ "dbType", "dbHost", "dbPort", "dbInstance" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "document": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string" }, "dataSourceFieldName": { "type": "string" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } }, "required": [ ] }, "additionalProperties": { "type": "object", "properties": { "primaryKey": { "type": "string" }, "titleColumn": { "type": "string" }, "bodyColumn": { "type": "string" }, "sqlQuery": { "type": "string", "not": { "pattern": ";+" } }, "timestampColumn": { "type": "string" }, "timestampFormat": { "type": "string" }, "timezone": { "type": "string" }, "changeDetectingColumns": { "type": "array", "items": { "type": "string" } }, "allowedUsersColumn": { "type": "string" }, "allowedGroupsColumn": { "type": "string" }, "sourceURIColumn": { "type": "string" }, "isSslEnabled": { "type": "boolean" } }, "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"] }, "type" : { "type" : "string", "pattern": "JDBC" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string" } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Amazon RDS (PostgreSQL) template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. Specify the type of data source as JDBC, the database type as postgresql, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Amazon RDS (PostgreSQL) JSON schema.

The following table describes the parameters of the Amazon RDS (PostgreSQL) JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata Required configuration information for connecting your data source.
  • dbType—The type of Java database that you use, whether mysql, db2, postgresql, oracle, or sqlserver.

  • dbHost—The database host name.

  • dbPort—The database port.

  • dbInstance—The database instance.

repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN.

document

A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source.
primaryKey Provide the primary key for the database table. This identifies a table within your database.
titleColumn Provide the name of the document title column within your database table.
bodyColumn Provide the name of the document title column within your database table.
sqlQuery Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
timestampColumn Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content.
timestampFormat Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.
timezone Enter the name of the column which contains time zones for the content to be crawled.
changeDetectingColumns Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns
allowedUsersColumns Enter the name of the column which contains User IDs to be allowed access to content.
allowedGroupsColumn Enter the name of the column which contains User IDs to be allowed access to content.
sourceURIColumn Enter the name of the column which contains Source URLs to be indexed.
isSslEnabled Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
type The type of data source. Specify JDBC as your data source type.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretArn The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys:
{ "user name": "database user name", "password": "password" }
version The version of the template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "dbType": { "type": "string", "enum": [ "mysql", "db2", "postgresql", "oracle", "sqlserver" ] }, "dbHost": { "type": "string" }, "dbPort": { "type": "string" }, "dbInstance": { "type": "string" } }, "required": [ "dbType", "dbHost", "dbPort", "dbInstance" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "document": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string" }, "dataSourceFieldName": { "type": "string" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } }, "required": [ ] }, "additionalProperties": { "type": "object", "properties": { "primaryKey": { "type": "string" }, "titleColumn": { "type": "string" }, "bodyColumn": { "type": "string" }, "sqlQuery": { "type": "string", "not": { "pattern": ";+" } }, "timestampColumn": { "type": "string" }, "timestampFormat": { "type": "string" }, "timezone": { "type": "string" }, "changeDetectingColumns": { "type": "array", "items": { "type": "string" } }, "allowedUsersColumn": { "type": "string" }, "allowedGroupsColumn": { "type": "string" }, "sourceURIColumn": { "type": "string" }, "isSslEnabled": { "type": "boolean" } }, "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"] }, "type" : { "type" : "string", "pattern": "JDBC" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string" } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Amazon S3 template schema

You include a JSON that contains the data source schema as part of the template configuration. You provide the name of the S3 bucket as a part of the connection configuration or repository endpoint details. Also specify the type of data source as S3, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See S3 JSON schema.

The following table describes the parameters of the Amazon S3 JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
BucketName The name of your Amazon S3 bucket.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
additionalProperties Additional configuration options for your content in your data source
  • inclusionPatterns

  • exclusionPatterns

  • inclusionPrefixes

  • exclusionPrefixes

A list of regular expression patterns to include or exclude specific files in your Amazon S3 data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
aclConfigurationFilePath The file path that controls access to documents in an Amazon Kendra index.
metadataFilesPrefix The location within your bucket for metadata files.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

type The type of data source. Specify S3 as your data source type.
version The version of the template that is supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "BucketName": { "type": "string" } }, "required": [ "BucketName" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "document": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING" ] }, "dataSourceFieldName": { "type": "string" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } }, "required": [ "document" ] }, "additionalProperties": { "type": "object", "properties": { "inclusionPatterns": { "type": "array" }, "exclusionPatterns": { "type": "array" }, "inclusionPrefixes": { "type": "array" }, "exclusionPrefixes": { "type": "array" }, "aclConfigurationFilePath": { "type": "string" }, "metadataFilesPrefix": { "type": "string" } } }, "syncMode": { "type": "string", "enum": [ "FULL_CRAWL", "FORCED_FULL_CRAWL" ] }, "type": { "type": "string", "pattern": "S3" }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] } }, "required": [ "connectionConfiguration", "type", "syncMode", "repositoryConfigurations" ] }

Amazon Kendra Web Crawler template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object.

You provide the seed or starting point URLs, or you can provide the sitemap URLs, as part of the connection configuration or repository endpoint details. Instead of manually listing all your URLs, you can provide the path to the Amazon S3 bucket that stores a text file for your list of seed URLs or sitemap XML files, which you can club together in a ZIP file in S3.

You also specify the type of data source as WEBCRAWLERV2, the website authentication credentials and authentication type if your websites require authentication, and other necessary configurations.

You then specify TEMPLATE as the Type when you call CreateDataSource.

Important

Web Crawler v2.0 connector creation is not supported by AWS CloudFormation. Use the Web Crawler v1.0 connector if you need AWS CloudFormation support.

When selecting websites to index, you must adhere to the Amazon Acceptable Use Policy and all other Amazon terms. Remember that you must only use Amazon Kendra Web Crawler to index your own web pages, or web pages that you have authorization to index. To learn how to stop Amazon Kendra Web Crawler from indexing your websites, see Configuring the robots.txt file for Amazon Kendra Web Crawler.

You can use the template provided in this developer guide. See Amazon Kendra Web Crawler JSON schema.

The following table describes the parameters of the Amazon Kendra Web Crawler JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
siteMapUrls The list of sitemap URLs for the websites that you want to crawl. You can list up to three sitemap URLs.
s3SeedUrl The S3 path to the text file that stores the list of seed or starting point URLs. For example, s3://bucket-name/directory/. Each URL in the text file must be formatted on a separate line. You can list up to 100 seed URLs in a file.
s3SiteMapUrl The S3 path to the sitemap XML files. For example, s3://bucket-name/directory/. You can list up to three sitemap XML files. You can club together multiple sitemap files into a ZIP file and store the ZIP file in your Amazon S3 bucket.
seedUrlConnections The list of seed or starting point URLs for the websites that you want to crawl.You can list up to 100 seed URLs.
seedUrl The seed or starting point URL.
authentication The authentication type if your websites require the same authentication, otherwise specify NoAuthentication.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • webPage

  • attachment

A list of objects that map the attributes or field names of your web pages and web page files to Amazon Kendra index field names. For example, the HTML web page title tag can be mapped to the _document_title index field. For more information, see Mapping data source fields.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

additionalProperties Additional configuration options for your content in your data source.
rateLimit The maximum number of URLs crawled per website host per minute.
maxFileSize The maximum size (in MB) of a web page or attachment to crawl.
crawlDepth The number of levels from the seed URL to crawl. For example, the seed URL page is depth 1 and any hyperlinks on this page that are also crawled are depth 2.
maxLinksPerUrl The maximum number of URLs on a web page to include when crawling a website. This number is per web page. As a website's web pages are crawled, any URLs that the webpages link to also are crawled. URLs on a web page are crawled in order of appearance.
crawlSubDomain true to crawl the website domains with subdomains. For example, if the seed URL is "abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled. If you don't set crawlSubDomain or crawlAllDomain to true, then Amazon Kendra only crawls the domains of the websites that you want to crawl.
crawlAllDomain true to crawl the website domains with subdomains and other domains the web pages link to. If you don't set crawlSubDomain or crawlAllDomain to true, then Amazon Kendra only crawls the domains of the websites that you want to crawl.
honorRobots true to respect the robots.txt directives of the websites that you want to crawl. These directives control how Amazon Kendra Web Crawler crawls the websites, whether Amazon Kendra can crawl only specific content or not crawl any content.
crawlAttachments true to crawl files that the web pages link to.
  • inclusionURLCrawlPatterns

  • inclusionURLIndexPatterns

A list of regular expression patterns to include crawling certain URLs and indexing any hyperlinks on these URL web pages. URLs that match the patterns are included in the index. URLs that don't match the patterns are excluded from the index. If a URL matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the URL/website's web pages aren't included in the index.
  • exclusionURLCrawlPatterns

  • exclusionURLIndexPatterns

A list of regular expression patterns to exclude crawling certain URLs and indexing any hyperlinks on these URL web pages. URLs that match the patterns are excluded from the index. URLs that don't match the patterns are included in the index. If a URL matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the URL/website's web pages aren't included in the index.
inclusionFileIndexPatterns A list of regular expression patterns to include certain web page files. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
exclusionFileIndexPatterns A list of regular expression patterns to exclude certain web page files. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
proxy Configuration information required to connect to your internal websites via a web proxy.
host The host name of the proxy sever you want to use to connect to internal websites. For example, the host name of https://a.example.com/page1.html is "a.example.com".
port The port number of the proxy sever you want to use to connect to internal websites. For example, 443 is the standard port for HTTPS.
secretArn (proxy) If web proxy credentials are required to connect to a website host, you can create an AWS Secrets Manager secret that stores the credentials. Provide the Amazon Resource Name (ARN) of the secret.
type The type of data source. Specify WEBCRAWLERV2 as your data source type.
secretArn

The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that's used if your websites require authentication to access the websites. You store the authentication credentials for the website in the secret that contains JSON key-value pairs.

If you use basic, or NTML/Kerberos, enter the user name and password. The JSON keys in the secret must be userName and password. NTLM authentication protocol includes password hashing, and Kerberos authentication protocol includes password encryption.

If you use SAML or form authentication, enter the user name and password, XPath for the user name field (and user name button if using SAML), XPaths for the password field and button, and the login page URL. The JSON keys in the secret must be userName, password, userNameFieldXpath, userNameButtonXpath, passwordFieldXpath, passwordButtonXpath, and loginPageUrl. You can find the XPaths (XML Path Language) of elements using your web browser's developer tools. XPaths usually follow this format: //tagname[@Attribute='Value'].

Amazon Kendra also checks if the endpoint information (seed URLs) included in the secret is the same the endpoint information specified in your data source endpoint configuration details.

version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "siteMapUrls": { "type": "array", "items":{ "type": "string", "pattern": "https://.*" } }, "s3SeedUrl": { "type": "string", "pattern": "s3:.*" }, "s3SiteMapUrl": { "type": "string", "pattern": "s3:.*" }, "seedUrlConnections": { "type": "array", "items": [ { "type": "object", "properties": { "seedUrl":{ "type": "string", "pattern": "https://.*" } }, "required": [ "seedUrl" ] } ] }, "authentication": { "type": "string", "enum": [ "NoAuthentication", "BasicAuth", "NTLM_Kerberos", "Form", "SAML" ] } } } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "webPage": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "attachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL" ] }, "additionalProperties": { "type": "object", "properties": { "rateLimit": { "type": "string", "default": "300" }, "maxFileSize": { "type": "string", "default": "50" }, "crawlDepth": { "type": "string", "default": "2" }, "maxLinksPerUrl": { "type": "string", "default": "100" }, "crawlSubDomain": { "type": "boolean", "default": false }, "crawlAllDomain": { "type": "boolean", "default": false }, "honorRobots": { "type": "boolean", "default": false }, "crawlAttachments": { "type": "boolean", "default": false }, "inclusionURLCrawlPatterns": { "type": "array", "items": { "type": "string" } }, "exclusionURLCrawlPatterns": { "type": "array", "items": { "type": "string" } }, "inclusionURLIndexPatterns": { "type": "array", "items": { "type": "string" } }, "exclusionURLIndexPatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFileIndexPatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileIndexPatterns": { "type": "array", "items": { "type": "string" } }, "proxy": { "type": "object", "properties": { "host": { "type": "string" }, "port": { "type": "string" }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } } } }, "required": [ "rateLimit", "maxFileSize", "crawlDepth", "crawlSubDomain", "crawlAllDomain", "maxLinksPerUrl", "honorRobots" ] }, "type": { "type": "string", "pattern": "WEBCRAWLERV2" }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "type", "additionalProperties" ] }

Confluence template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the Confluence host URL, the hosting method, and the authentication type as a part of the connection configuration or repository endpoint details. Also specify the type of data source as CONFLUENCEV2, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Confluence JSON schema.

The following table describes the parameters of the Confluence JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
hostUrl The URL for your Confluence instance. For example, https://example.confluence.com.
type The hosting method for your Confluence instance, whether SAAS and ON_PREM.
authType The authentication method for your Confluence instance, whether Basic, OAuth2, or Personal-token.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • space

  • page

  • blog

  • comment

  • attachment

A list of objects that map the attributes or field names of your Confluence spaces, pages, blogs, comments, and attachments to Amazon Kendra index field names. For more information, see Mapping data source fields. The Confluence data source field names must exist in your Confluence custom metadata.
additionalProperties Additional configuration options for your content in your data source.
isCrawlAcl true to crawl the access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see User context filtering.
fieldForUserId Specify email if you want to use the user email for the user ID. email is used by default and is currently the only supported user ID type.
  • inclusionSpaceKeyFilter

  • exclusionSpaceKeyFilter

  • pageTitleRegEX

  • blogTitleRegEX

  • commentTitleRegEX

  • attachmentTitleRegEX

  • inclusionFileTypePatterns

  • exclusionFileTypePatterns

  • inclusionUrlPatterns

  • exclusionUrlPatterns

A list of regular expression patterns to include and/or exclude certain files in your Confluence data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
proxyHost The host name of the web proxy that you use, without the http:// or https:// protocol.

proxyPort

The port number used by the host URL transport protocol. Must be a numeric value between 0 and 65535.
  • isCrawlPersonalSpace

  • isCrawlArchivedSpace

  • isCrawlArchivedPage

  • isCrawlPage

  • isCrawlBlog

  • isCrawlPageComment

  • isCrawlPageAttachment

  • isCrawlBlogComment

  • isCrawlBlogAttachment

true to crawl files in your Confluence personal spaces, pages, blogs, page comments, page attachments, blog comments, and blog attachments.
maxFileSizeInMegaBytes Specify the file size limit in MBs that Amazon Kendra can crawl. Amazon Kendra crawls only the files within the size limit you define. The default file size is 50MB. The maximum file size should be greater than 0MB and less than or equal to 50MB.
type The type of data source. Specify CONFLUENCEV2 as your data source type.
enableIdentityCrawler true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the PutPrincipalMapping API to upload user and group access information.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Confluence. For information on these key-value pairs, see Connection instructions for Confluence.
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "hostUrl": { "type": "string", "pattern": "https:.*" }, "type": { "type": "string", "enum": [ "SAAS", "ON_PREM" ] }, "authType": { "type": "string", "enum": [ "Basic", "OAuth2", "Personal-token" ] } }, "required": [ "hostUrl", "type", "authType" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "space": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "page": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "blog": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "comment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "attachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "usersAclS3FilePath": { "type": "string" }, "isCrawlAcl": { "type": "boolean" }, "fieldForUserId": { "type": "string" }, "inclusionSpaceKeyFilter": { "type": "array", "items": { "type": "string" } }, "exclusionSpaceKeyFilter": { "type": "array", "items": { "type": "string" } }, "pageTitleRegEX": { "type": "array", "items": { "type": "string" } }, "blogTitleRegEX": { "type": "array", "items": { "type": "string" } }, "commentTitleRegEX": { "type": "array", "items": { "type": "string" } }, "attachmentTitleRegEX": { "type": "array", "items": { "type": "string" } }, "isCrawlPersonalSpace": { "type": "boolean" }, "isCrawlArchivedSpace": { "type": "boolean" }, "isCrawlArchivedPage": { "type": "boolean" }, "isCrawlPage": { "type": "boolean" }, "isCrawlBlog": { "type": "boolean" }, "isCrawlPageComment": { "type": "boolean" }, "isCrawlPageAttachment": { "type": "boolean" }, "isCrawlBlogComment": { "type": "boolean" }, "isCrawlBlogAttachment": { "type": "boolean" }, "maxFileSizeInMegaBytes": { "type":"string" }, "inclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionUrlPatterns": { "type": "array", "items": { "type": "string" } }, "exclusionUrlPatterns": { "type": "array", "items": { "type": "string" } }, "proxyHost": { "type": "string" }, "proxyPort": { "type": "string" } }, "required": [] }, "type": { "type": "string", "pattern": "CONFLUENCEV2" }, "enableIdentityCrawler": { "type": "boolean" }, "syncMode": { "type": "string", "enum": [ "FULL_CRAWL", "FORCED_FULL_CRAWL" ] }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Dropbox template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the Dropbox app key, app secret, and access token as part of your secret that stores your authentication credentials. Also specify the type of data source as DROPBOX, the type of access token you want to use (temporary or permanent), and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Dropbox JSON schema.

The following table describes the parameters of the Dropbox JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source. This data source does not specify an endpoint in repositoryEndpointMetadata. Rather, the connection information is included in an AWS Secrets Manager secret that you provide the secretArn.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • file

  • paper

  • papert

  • shortcut

A list of objects that map the attributes or field names of your Dropbox files, Dropbox Paper, and shortcuts to Amazon Kendra index field names. For more information, see Mapping data source fields.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

enableIdentityCrawler true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the PutPrincipalMapping API to upload user and group access information.
secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Dropbox. The secret must contain a JSON structure with the following keys:
{ "appKey": "Dropbox app key", "appSecret": "Dropbox app secret", "accesstoken": "temporary access token or refresh access token" }
additionalProperties Additional configuration options for your content in your data source.
isCrawlAcl true to crawl the access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see User context filtering.
  • inclusionFileNamePatterns

  • inclusionFileTypePatterns

A list of regular expression patterns to include certain file names and types in your Dropbox data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
  • exclusionFileNamePatterns

  • exclusionFileTypePatterns

A list of regular expression patterns to exclude certain file names and types in your Dropbox data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
  • crawlFile

  • crawlPaper

  • crawlPapert

  • crawlShortcut

true to crawl files in your Dropbox, Dropbox Paper documents, Dropbox Paper templates, and web page shortcuts stored in your Dropbox.
type The type of data source. Specify DROPBOX as your data source type.
tokenType Specify your access token type: permanent or temporary access token. It's recommended that you create a refresh access token that never expires in Dropbox rather that relying on a one-time access token that expires after 4 hours. You create an app and a refresh access token in the Dropbox developer console and provide the access token in your secret.
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { } } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "file": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "LONG", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "paper": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "LONG", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "papert": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "LONG", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "shortcut": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "LONG", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] } } }, "syncMode": { "type": "string", "enum": [ "FULL_CRAWL", "FORCED_FULL_CRAWL", "CHANGE_LOG" ] }, "enableIdentityCrawler": { "type": "boolean" }, "secretArn": { "type": "string" }, "additionalProperties": { "type": "object", "properties": { "isCrawlAcl": { "type": "boolean" }, "inclusionFileNamePatterns": { "type": "array" }, "exclusionFileNamePatterns": { "type": "array" }, "inclusionFileTypePatterns": { "type": "array" }, "exclusionFileTypePatterns": { "type": "array" }, "crawlFile": { "type": "boolean" }, "crawlPaper": { "type": "boolean" }, "crawlPapert": { "type": "boolean" }, "crawlShortcut": { "type": "boolean" } } }, "type": { "type": "string", "pattern": "DROPBOX" }, "tokenType": { "type": "string", "enum": [ "PERMANENT", "TEMPORARY" ] }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] } }, "additionalProperties": false, "required": [ "connectionConfiguration", "repositoryConfigurations", "additionalProperties", "syncMode", "enableIdentityCrawler", "secretArn", "type", "tokenType" ] }

Drupal template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the Drupal host URL and the authentication type as part of the connection configuration or repository endpoint details. Also specify the type of data source as DRUPAL, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Drupal JSON schema.

The following table describes the parameters of the Drupal JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
hostUrl The host url of your Drupal website. For example, https://<hostname>/<drupalsitename>.
repositoryConfigurations Configuration information for the content of the data source.
  • content

  • comment

  • attachment

A list of objects that map the attributes or field names of your Drupal files. For more information, see Mapping data source fields. The Drupal data source field names must exist in your Drupal custom metadata.
additionalProperties Additional configuration options for your content in your data source.
  • inclusionFileNamePatterns

  • articleTitleInclusionPatterns

  • pageTitleInclusionPatterns

  • customContentTitleInclusionPatterns

  • basicBlockTitleInclusionPatterns

  • customBlockTitleInclusionPatterns

A list of regular expression patterns to include certain files in your Drupal data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
  • exclusionFileNamePatterns

  • articleTitleExclusionPatterns

  • pageTitleExclusionPatterns

  • customContentTitleExclusionPatterns

  • basicBlockTitleExclusionPatterns

  • customBlockTitleExclusionPatterns

A list of regular expression patterns to exclude certain files in your Drupal data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
contentDefinitions
  • contentType

  • fieldDefinition

  • isCrawlComments

  • isCrawlFiles

  • isCrawlArticle

  • isCrawlBasicPage

  • isCrawlBasicBlock

  • isCrawlCustomContentTypesList

Specify the content types to crawl and whether to crawl comments and attachments for your selected content types.
type The type of data source. Specify DRUPAL as your data source type.
authType The type of authentication that you use, whether BASIC-AUTH or OAUTH2.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

enableIdentityCrawler true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the PutPrincipalMapping API to upload user and group access information.
secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Drupal. The secret must contain a JSON structure with the following keys:

If using basic authentication:

{ "username": "user name", "passwords": "password" }

If using OAuth 2.0 authentication:

{ "username": "user name", "password": "password", "clientId": "client id", "clientSecret": "client secret" }
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "hostUrl": { "type": "string", "pattern": "https:.*" } }, "required": [ "hostUrl" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "content": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "comment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "attachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "isCrawlArticle": { "type": "boolean" }, "isCrawlBasicPage": { "type": "boolean" }, "isCrawlBasicBlock": { "type": "boolean" }, "crawlCustomContentTypesList": { "type": "array", "items": { "type": "string" } }, "crawlCustomBlockTypesList": { "type": "array", "items": { "type": "string" } }, "filePath": { "anyOf": [ { "type": "string", "pattern": "s3:.*" }, { "type": "string", "pattern": "" } ] }, "inclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "articleTitleInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "articleTitleExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "pageTitleInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "pageTitleExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "customContentTitleInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "customContentTitleExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "basicBlockTitleInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "basicBlockTitleExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "customBlockTitleInclusionPatterns": { "type": "array", "items": { "type": "string" } }, "customBlockTitleExclusionPatterns": { "type": "array", "items": { "type": "string" } }, "contentDefinitions": { "type": "array", "items": { "properties": { "contentType": { "type": "string" }, "fieldDefinition": { "type": "array", "items": [ { "type": "object", "properties": { "machineName": { "type": "string" }, "type": { "type": "string" } }, "required": [ "machineName", "type" ] } ] }, "isCrawlComments": { "type": "boolean" }, "isCrawlFiles": { "type": "boolean" } } }, "required": [ "contentType", "fieldDefinition", "isCrawlComments", "isCrawlFiles" ] } }, "required": [] }, "type": { "type": "string", "pattern": "DRUPAL" }, "authType": { "type": "string", "enum": [ "BASIC-AUTH", "OAUTH2" ] }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "enableIdentityCrawler": { "type": "boolean" }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

GitHub template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the GitHub host URL, the organization name, and whether you use GitHub cloud or GitHub on-premises as part of the connection configuration or repository endpoint details. Also specify the type of data source as GITHUB, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See GitHub JSON schema.

The following table describes the parameters of the GitHub JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
type Specify the type as either SAAS or ON_PREMISE.
hostUrl The GitHub host URL. For example, if you use GitHub SaaS/Enterprise Cloud: https://api.github.com. Or, if you use GitHub on-premises/Enterprise Server: https://on-prem-host-url/api/v3/.
organizationName You can find your organization name when you log in to GitHub desktop and go to Your organizations under your profile picture dropdown.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • ghRepository

  • ghCommit

  • ghIssueDocument

  • ghIssueComment

  • ghIssueAttachment

  • ghPRDocument

  • ghPRComment

  • ghPRAttachment

A list of objects that map the attributes or field names of your GitHub content to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source.
isCrawlAcl true to crawl the access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access and search. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see User context filtering.
fieldForUserId Specify the type of user ID that you want to use for ACL crawling. Specify either email if you want to use the user email for the user ID, or username if you want to use the user name for the user ID. If you don't specify an option then email is used by default.
repositoryFilter A list of names of the specific repositories and branch names you want to index.
crawlRepository true to crawl repositories.
crawlRepositoryDocuments true to crawl repository documents.
crawlIssue true to crawl issues.
crawlIssueComment true to crawl issue comments.
crawlIssueCommentAttachment true to crawl issue comment attachments.
crawlPullRequest true to crawl pull requests.
crawlPullRequestComment true to crawl pull request comments.
crawlPullRequestCommentAttachment true to crawl pull request comment attachments.
  • inclusionFolderNamePatterns

  • inclusionFileTypePatterns

  • inclusionFileNamePatterns

A list of regular expression patterns to include certain content in your GitHub data source. Content that matches the patterns are included in the index. Content that doesn't match the patterns are excluded from the index. If any content matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index.
  • exclusionFolderNamePatterns

  • exclusionFileTypePatterns

  • exclusionFileNamePatterns

A list of regular expression patterns to exclude certain content in your GitHub data source. Content that matches the patterns are excluded from the index. Content that doesn't match the patterns are included in the index. If any content matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index.
type The type of data source. Specify GITHUB as your data source type.
enableIdentityCrawler true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the PutPrincipalMapping API to upload user and group access information.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretArn

The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your GitHub. The secret must contain a JSON structure with the following keys:

{ "personalToken": "token" }
version The version of this template that's currently supported.

The following is the GitHub JSON schema:

{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "type": { "type": "string" }, "hostUrl": { "type": "string", "pattern": "https://.*" }, "organizationName": { "type": "string" } }, "required": [ "type", "hostUrl", "organizationName" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "ghRepository": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "ghCommit": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "ghIssueDocument": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "ghIssueComment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "ghIssueAttachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "ghPRDocument": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "ghPRComment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "ghPRAttachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "isCrawlAcl": { "type": "boolean" }, "fieldForUserId": { "type": "string" }, "crawlRepository": { "type": "boolean" }, "crawlRepositoryDocuments": { "type": "boolean" }, "crawlIssue": { "type": "boolean" }, "crawlIssueComment": { "type": "boolean" }, "crawlIssueCommentAttachment": { "type": "boolean" }, "crawlPullRequest": { "type": "boolean" }, "crawlPullRequestComment": { "type": "boolean" }, "crawlPullRequestCommentAttachment": { "type": "boolean" }, "repositoryFilter": { "type": "array", "items": [ { "type": "object", "properties": { "repositoryName": { "type": "string" }, "branchNameList": { "type": "array", "items": { "type": "string" } } } } ] }, "inclusionFolderNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFolderNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } } }, "required": [] }, "type": { "type": "string", "pattern": "GITHUB" }, "syncMode": { "type": "string", "enum": [ "FULL_CRAWL", "FORCED_FULL_CRAWL", "CHANGE_LOG" ] }, "enableIdentityCrawler": { "type": "boolean" }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "enableIdentityCrawler" ] }

Gmail template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. Specify the type of data source as GMAIL, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Gmail JSON schema.

The following table describes the parameters of the Gmail JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source. This data source does not specify an endpoint in repositoryEndpointMetadata. Rather, the connection information is included in an AWS Secrets Manager secret that you provide the secretArn.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN.
  • message

  • attachments

A list of objects that map the attributes or field names of your Gmail messages and attachments to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source.
  • inclusionLabelNamePatterns

  • exclusionLabelNamePatterns

  • inclusionAttachmentTypePatterns

  • exclusionAttachmentTypePatterns

  • inclusionAttachmentNamePatterns

  • exclusionAttachmentNamePatterns

  • inclusionSubjectFilter

  • exclusionSubjectFilter

  • isSubjectAnd

  • inclusionFromFilter

  • exclusionFromFilter

  • inclusionToFilter

  • exclusionToFilter

  • inclusionCcFilter

  • exclusionCcFilter

  • inclusionBccFilter

  • exclusionBccFilter

A list of regular expression patterns to include or exclude messages with specific subject names in your Gmail data source. Files that match the patterns are included in the index. If a file matches both an inclusion and an exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
beforeDateFilter Specify messages and attachments to be included before a certain date.
afterDateFilter Specify messages and attachments to be included after a certain date.
isCrawlAttachment A Boolean value to choose whether you want to crawl attachments. Messages are automatically crawled.
type The type of data source. Specify GMAIL as your data source type.
shouldCrawlDraftMessages A Boolean value to choose whether you want to crawl draft messages.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

Important

Because there is no API to update permanently deleted Gmail messages, any new, modified, or deleted content sync:

  • Won't remove messages that were permanently deleted from Gmail from your Amazon Kendra index

  • Won't sync changes in Gmail email labels

To sync your Gmail data source label changes and permanently deleted email messages to your Amazon Kendra index, you must run full crawls periodically.

secretARN The Amazon Resource Name (ARN) of a Secrets Manager secret that contains the key-value pairs required to connect to your Gmail. The secret must contain a JSON structure with the following keys:
{ "adminAccountEmailId": "service account email", "clientEmailId": "user account email", "privateKey": "private key" }
version The version of the template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { } }, "repositoryConfigurations": { "type": "object", "properties": { "message": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "attachments": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING"] }, "dataSourceFieldName": { "type": "string" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } } }, "required": [] }, "additionalProperties": { "type": "object", "properties": { "inclusionLabelNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionLabelNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionAttachmentTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionAttachmentTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionAttachmentNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionAttachmentNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionSubjectFilter": { "type": "array", "items": { "type": "string" } }, "exclusionSubjectFilter": { "type": "array", "items": { "type": "string" } }, "isSubjectAnd": { "type": "boolean" }, "inclusionFromFilter": { "type": "array", "items": { "type": "string" } }, "exclusionFromFilter": { "type": "array", "items": { "type": "string" } }, "inclusionToFilter": { "type": "array", "items": { "type": "string" } }, "exclusionToFilter": { "type": "array", "items": { "type": "string" } }, "inclusionCcFilter": { "type": "array", "items": { "type": "string" } }, "exclusionCcFilter": { "type": "array", "items": { "type": "string" } }, "inclusionBccFilter": { "type": "array", "items": { "type": "string" } }, "exclusionBccFilter": { "type": "array", "items": { "type": "string" } }, "beforeDateFilter": { "anyOf": [ { "type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$" }, { "type": "string", "pattern": "" } ] }, "afterDateFilter": { "anyOf": [ { "type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$" }, { "type": "string", "pattern": "" } ] }, "isCrawlAttachment": { "type": "boolean" }, "shouldCrawlDraftMessages": { "type": "boolean" } }, "required": [ "isCrawlAttachment", "shouldCrawlDraftMessages" ] }, "type" : { "type" : "string", "pattern": "GMAIL" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL" ] }, "secretArn": { "type": "string" }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] } }, "required": [ "connectionConfiguration", "repositoryConfigurations", "additionalProperties", "syncMode", "secretArn", "type" ] }

Google Drive template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. Specify the type of data source as GOOGLEDRIVE2, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Google Drive JSON schema.

The following table describes the parameters of the Google Drive JSON schema.

Configuration Description
connectionConfiguration Configuration information for the data source.
repositoryEndpointMetadata The endpoint information for the data source. This data source does not specify an endpoint. You choose your authentication type: serviceAccount and OAuth2. The connection information is included in an AWS Secrets Manager secret that you provide the secretArn.
authType Choose between serviceAccount and OAuth2 based on your use case.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • file

  • comment

A list of objects that map the attributes or field names of your Google Drive to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source
  • maxFileSizeInMegaBytes

Specify a file size limit in MBs that Amazon Kendra should crawl.
  • iscrawlComment

true to crawl comments in your Google Drive data source.
  • isCrawlMyDriveAndSharedWithMe

true to crawl MyDrive and Shared With Me Drives in your Google Drive data source.
  • isCrawlSharedDrives

true to crawl Shared Drives in your Google Drive data source.
isCrawlAcl true to crawl the access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access and search. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see User context filtering.
  • excludeUserAccounts

  • excludeSharedDrives

  • excludeMimeTypes

  • exclusionFileTypePatterns

  • exclusionFileNamePatterns

  • exclusionFilePathFilter

A list of regular expression patterns to exclude certain files in your Google Drive data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
  • includeUserAccounts

  • includeSharedDrives

  • includeMimeTypes

  • inclusionFileTypePatterns

  • inclusionFileNamePatterns

  • inclusionFilePathFilter

A list of regular expression patterns to include certain files in your Google Drive data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
type The type of data source. Specify GOOOGLEDRIVEV2 as your data source type.
enableIdentityCrawler true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the PutPrincipalMapping API to upload user and group access information.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Google Drive. The secret must contain a JSON structure with the following keys:

If using Google Service Account authentication:

{ "clientEmail": "user account email", "adminAccountEmail": "service account email", "privateKey": "private key" }

If using OAuth 2.0 authentication:

{ "clientID": "OAuth client ID", "clientSecret": "client secret", "refreshToken": "refresh token" }
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "authType": { "type": "string", "enum": [ "serviceAccount", "OAuth2" ] } }, "required": [ "authType" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "file": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "STRING_LIST", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "comment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "STRING_LIST" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "maxFileSizeInMegaBytes": { "type": "string" }, "isCrawlComment": { "type": "boolean" }, "isCrawlMyDriveAndSharedWithMe": { "type": "boolean" }, "isCrawlSharedDrives": { "type": "boolean" }, "isCrawlAcl": { "type": "boolean" }, "excludeUserAccounts": { "type": "array", "items": { "type": "string" } }, "excludeSharedDrives": { "type": "array", "items": { "type": "string" } }, "excludeMimeTypes": { "type": "array", "items": { "type": "string" } }, "includeUserAccounts": { "type": "array", "items": { "type": "string" } }, "includeSharedDrives": { "type": "array", "items": { "type": "string" } }, "includeMimeTypes": { "type": "array", "items": { "type": "string" } }, "includeTargetAudienceGroup": { "type": "array", "items": { "type": "string" } }, "inclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFilePathFilter": { "type": "array", "items": { "type": "string" } }, "exclusionFilePathFilter": { "type": "array", "items": { "type": "string" } } } }, "type": { "type": "string", "pattern": "GOOGLEDRIVEV2" }, "enableIdentityCrawler": { "type": "boolean" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

IBM DB2 template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. Specify the type of data source as JDBC, the database type as db2, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See IBM DB2 JSON schema.

The following table describes the parameters of the IBM DB2 JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata Required configuration information for connecting your data source.
  • dbType—The type of Java database that you use, whether mysql, db2, postgresql, oracle, or sqlserver.

  • dbHost—The database host name.

  • dbPort—The database port.

  • dbInstance—The database instance.

repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN.

document

A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source.
primaryKey Provide the primary key for the database table. This identifies a table within your database.
titleColumn Provide the name of the document title column within your database table.
bodyColumn Provide the name of the document title column within your database table.
sqlQuery Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
timestampColumn Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content.
timestampFormat Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.
timezone Enter the name of the column which contains time zones for the content to be crawled.
changeDetectingColumns Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns
allowedUsersColumns Enter the name of the column which contains User IDs to be allowed access to content.
allowedGroupsColumn Enter the name of the column which contains User IDs to be allowed access to content.
sourceURIColumn Enter the name of the column which contains Source URLs to be indexed.
isSslEnabled Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
type The type of data source. Specify JDBC as your data source type.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretArn The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys:
{ "user name": "database user name", "password": "password" }
version The version of the template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "dbType": { "type": "string", "enum": [ "mysql", "db2", "postgresql", "oracle", "sqlserver" ] }, "dbHost": { "type": "string" }, "dbPort": { "type": "string" }, "dbInstance": { "type": "string" } }, "required": [ "dbType", "dbHost", "dbPort", "dbInstance" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "document": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string" }, "dataSourceFieldName": { "type": "string" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } }, "required": [ ] }, "additionalProperties": { "type": "object", "properties": { "primaryKey": { "type": "string" }, "titleColumn": { "type": "string" }, "bodyColumn": { "type": "string" }, "sqlQuery": { "type": "string", "not": { "pattern": ";+" } }, "timestampColumn": { "type": "string" }, "timestampFormat": { "type": "string" }, "timezone": { "type": "string" }, "changeDetectingColumns": { "type": "array", "items": { "type": "string" } }, "allowedUsersColumn": { "type": "string" }, "allowedGroupsColumn": { "type": "string" }, "sourceURIColumn": { "type": "string" }, "isSslEnabled": { "type": "boolean" } }, "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"] }, "type" : { "type" : "string", "pattern": "JDBC" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string" } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Microsoft Exchange template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the tenant ID as as a part of the connection configuration or repository endpoint details. Also specify the type of data source as MSEXCHANGE, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Microsoft Exchange JSON schema.

The following table describes the parameters of the Microsoft Exchange JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
tenantId The Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • email

  • attachment

  • calendar

  • contacts

  • notes

A list of objects that map the attributes or field names of your Microsoft Exchange data source to Amazon Kendra index fields. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for content in your data source
inclusionPatterns A list of regular expression patterns to include certain files in your Microsoft Exchange data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
exclusionPatterns A list of regular expression patterns to exclude certain files in your Microsoft Exchange data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
  • inclusionUsersList

  • inclusionUsersFileName

  • inclusionDomainUsers

A list of regular expression patterns to include certain users and user files in your Microsofot Exchange data source. Users that match the patterns are included in the index. Users that don't match the patterns are excluded from the index. If a user matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the user isn't included in the index.
  • exclusionUsersList

  • exclusionUsersFileName

  • exclusionDomainUsers

A list of regular expression patterns to exclude certain users and user files in your Microsoft Exchange data source. Users that match the patterns are excluded from the index. Users that don't match the patterns are included in the index. If a user matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the user isn't included in the index.
s3bucketName The name of your S3 bucket if that you want to use.
  • crawlCalendar

  • crawlNotes

  • crawlContacts

  • crawlFolderAcl

true to crawl these types of content and access control information your Microsoft Exchange data source.
startCalendarDateTime You can configure a specific start date-time for your calendar content.
endCalendarDateTime You can configure a specific end date-time for calendar content.
subject You can configure a specific subject line for your mail content.
emailFrom You can configure a specific email for your 'From' or sender mail content.
emailTo You can configure a specific email for your 'To' or recipient mail content.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

type The type of data source. Specify MSEXCHANGE as your data source type.
secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Microsoft Exchange. This includes your client ID and your client secret that is generated when you create an OAuth application in the Azure portal.
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "tenantId": { "type": "string", "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", "minLength": 36, "maxLength": 36 } }, "required": ["tenantId"] } } }, "repositoryConfigurations": { "type": "object", "properties": { "email": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "attachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "DATE","LONG"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "calendar": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "contacts": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "notes": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } }, "required": ["email" ] }, "additionalProperties": { "type": "object", "properties": { "inclusionPatterns": { "type": "array", "items": { "type": "string" } }, "exclusionPatterns": { "type": "array", "items": { "type": "string" } }, "inclusionUsersList": { "type": "array", "items": { "type": "string", "format": "email" } }, "exclusionUsersList": { "type": "array", "items": { "type": "string", "format": "email" } }, "s3bucketName": { "type": "string" }, "inclusionUsersFileName": { "type": "string" }, "exclusionUsersFileName": { "type": "string" }, "inclusionDomainUsers": { "type": "array", "items": { "type": "string" } }, "exclusionDomainUsers": { "type": "array", "items": { "type": "string" } }, "crawlCalendar": { "type": "boolean" }, "crawlNotes": { "type": "boolean" }, "crawlContacts": { "type": "boolean" }, "crawlFolderAcl": { "type": "boolean" }, "startCalendarDateTime": { "anyOf": [ { "type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$" }, { "type": "string", "pattern": "" } ] }, "endCalendarDateTime": { "anyOf": [ { "type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$" }, { "type": "string", "pattern": "" } ] }, "subject": { "type": "array", "items": { "type": "string" } }, "emailFrom": { "type": "array", "items": { "type": "string", "format": "email" } }, "emailTo": { "type": "array", "items": { "type": "string", "format": "email" } } }, "required": [ ] }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "type" : { "type" : "string", "pattern": "MSEXCHANGE" }, "secretArn": { "type": "string" } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Microsoft OneDrive template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the tenant ID as part of the connection configuration or repository endpoint details. Also specify the type of data source as ONEDRIVEV2, and a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Microsoft OneDrive JSON schema.

The following table describes the parameters of the Microsoft OneDrive JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
tenantId The Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
file A list of objects that map the attributes or field names of your Microsoft OneDrive files to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source
  • userNameFilter

  • userFilterPath

  • inclusionFileTypePatterns

  • exclusionFileTypePatterns

  • inclusionFileNamePatterns

  • exclusionFileNamePatterns

  • inclusionFilePathPatterns

  • exclusionFilePathPatterns

  • inclusionOneNoteSectionNamePatterns

  • exclusionOneNoteSectionNamePatterns

  • inclusionOneNotePageNamePatterns

  • exclusionOneNotepageNamePatterns

You can choose to index specific files, OneNote sections, OneNote pages, and filter by user name.
isUserNameOnS3 true to provide a list of user names in a file stored in an Amazon S3.
type The type of data source. Specify ONEDRIVEV2 as your data source type.
enableIdentityCrawler true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the PutPrincipalMapping API to upload user and group access information.
type The type of data source. Specify ONEDRIVEV2 as your data source type.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Microsoft OneDrive. The secret must contain a JSON structure with the following keys:
{ "clientId": "client ID", "clientSecret": "client secret" }
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "tenantId": { "type": "string", "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", "minLength": 36, "maxLength": 36 } }, "required": [ "tenantId" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "file": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "userNameFilter": { "type": "array", "items": { "type": "string" } }, "userFilterPath": { "type": "string" }, "isUserNameOnS3": { "type": "boolean" }, "inclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFilePathPatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFilePathPatterns": { "type": "array", "items": { "type": "string" } }, "inclusionOneNoteSectionNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionOneNoteSectionNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionOneNotePageNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionOneNotePageNamePatterns": { "type": "array", "items": { "type": "string" } } }, "required": [] }, "enableIdentityCrawler": { "type": "boolean" }, "type": { "type": "string", "pattern": "ONEDRIVEV2" }, "syncMode": { "type": "string", "enum": [ "FULL_CRAWL", "FORCED_FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Microsoft SharePoint template schema

You include a JSON that contains the data source schema as part of TemplateConfiguration object. You provide the SharePoint site URL/URLs, domain, and also a tenant ID if required as a part of the connection configuration or repository endpoint details. Also specify the type of data source as SHAREPOINTV2, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See SharePoint JSON schema.

The following table describes the parameters of the Microsoft SharePoint JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source
repositoryEndpointMetadata The endpoint information for the data source
tenantId The tenant id of your SharePoint account.
domain The domain of your SharePoint account.
siteUrls The host URLs of your SharePoint account.
repositoryAdditionalProperties Additional properties to connect with the repository/data source endpoint.
s3bucketName The name of the Amazon S3 bucket that stores your Azure AD self-signed X.509 certificate.
s3certificateName The name of the Azure AD self-signed X.509 certificate stored in your Amazon S3 bucket.
authType The type of authentication that you use, whether OAuth2, OAuth2Certificate, OAuth2App, Basic, OAuth2_RefreshToken, NTLM, or Kerberos.
version The SharePoint version that you use, whether Server or Online.
onPremVersion The SharePoint Server version that you use, whether 2013, 2016 2019, or SubscriptionEdition.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • event

  • page

  • file

  • link

  • attachment

  • comment

A list of objects that map the attributes or field names of your SharePoint content to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source.
  • eventTitleFilterRegEx

  • pageTitleFilterRegEx

  • linkTitleFilterRegEx

  • inclusionFilePath

  • exclusionFilePath

  • inclusionFileTypePatterns

  • exclusionFileTypePatterns

  • inclusionFileNamePatterns

  • exclusionFileNamePatterns

  • inclusionOneNoteSectionNamePatterns

  • exclusionOneNoteSectionNamePatterns

  • inclusionOneNotePageNamePatterns

  • exclusionOneNotePageNamePatterns

A list of regular expression patterns to include/exclude certain content in your SharePoint data source. Content itmes that match the inclusion patterns are included in the index. Content items that don't match the inclusion patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
  • crawlFiles

  • crawlPages

  • crawlEvents

  • crawlComments

  • crawlLinks

  • crawlAttachments

true to crawl these types of content.
crawlAcl true to crawl the access control list (ACL) information for your documents, if you have an ACL and want to use it for access control. The ACL specifies which documents that users and groups can access and search. The ACL information is used to filter search results based on the user or their group access to documents. For more information, see User context filtering.
fieldForUserId Specify either email if you want to use the user email for the user ID, or userPrincipalName if you want to use a user name for the user ID. If you don't specify an option then email is used by default.
aclConfiguration Specify either ACLWithLDAPEmailFmt, ACLWithManualEmailFmt, or ACLWithUsernameFmtM.
emailDomain The domain of the email. For example, "amazon.com".
  • isCrawlLocalGroupMapping

  • isCrawlAdGroupMapping

true to crawl group mapping information.
proxyHost The host name of the web proxy that you use, without the http:// or https:// protocol.
proxyPort The port number used by the host URL transport protocol. Must be a numeric value between 0 and 65535.
type Specify SHAREPOINTV2 as your data source type
enableIdentityCrawler true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the PutPrincipalMapping API to upload user and group access information.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your SharePoint. For information on these key-value pairs, see Connection instructions for SharePoint Online and SharePoint Server.
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "tenantId": { "type": "string", "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", "minLength": 36, "maxLength": 36 }, "domain": { "type": "string" }, "siteUrls": { "type": "array", "items": { "type": "string", "pattern": "https://.*" } }, "repositoryAdditionalProperties": { "type": "object", "properties": { "s3bucketName": { "type": "string" }, "s3certificateName": { "type": "string" }, "authType": { "type": "string", "enum": [ "OAuth2", "OAuth2Certificate", "OAuth2App", "Basic", "OAuth2_RefreshToken", "NTLM", "Kerberos" ] }, "version": { "type": "string", "enum": [ "Server", "Online" ] }, "onPremVersion": { "type": "string", "enum": [ "", "2013", "2016", "2019", "SubscriptionEdition" ] } }, "required": [ "authType", "version" ] } }, "required": [ "siteUrls", "domain", "repositoryAdditionalProperties" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "event": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "page": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "file": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "link": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "attachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "comment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "eventTitleFilterRegEx": { "type": "array", "items": { "type": "string" } }, "pageTitleFilterRegEx": { "type": "array", "items": { "type": "string" } }, "linkTitleFilterRegEx": { "type": "array", "items": { "type": "string" } }, "inclusionFilePath": { "type": "array", "items": { "type": "string" } }, "exclusionFilePath": { "type": "array", "items": { "type": "string" } }, "inclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionOneNoteSectionNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionOneNoteSectionNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionOneNotePageNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionOneNotePageNamePatterns": { "type": "array", "items": { "type": "string" } }, "crawlFiles": { "type": "boolean" }, "crawlPages": { "type": "boolean" }, "crawlEvents": { "type": "boolean" }, "crawlComments": { "type": "boolean" }, "crawlLinks": { "type": "boolean" }, "crawlAttachments": { "type": "boolean" }, "crawlListData": { "type": "boolean" }, "crawlAcl": { "type": "boolean" }, "fieldForUserId": { "type": "string" }, "aclConfiguration": { "type": "string", "enum": [ "ACLWithLDAPEmailFmt", "ACLWithManualEmailFmt", "ACLWithUsernameFmt" ] }, "emailDomain": { "type": "string" }, "isCrawlLocalGroupMapping": { "type": "boolean" }, "isCrawlAdGroupMapping": { "type": "boolean" }, "proxyHost": { "type": "string" }, "proxyPort": { "type": "string" } }, "required": [ ] }, "type": { "type": "string", "pattern": "SHAREPOINTV2" }, "enableIdentityCrawler": { "type": "boolean" }, "syncMode": { "type": "string", "enum": [ "FULL_CRAWL", "FORCED_FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "enableIdentityCrawler", "syncMode", "additionalProperties", "secretArn", "type" ] }

Microsoft SQL Server template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. Specify the type of data source as JDBC, the database type as sqlserver, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Microsoft SQL Server JSON schema.

The following table describes the parameters of the Micorosft SQL Server JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata Required configuration information for connecting your data source.
  • dbType—The type of Java database that you use, whether mysql, db2, postgresql, oracle, or sqlserver.

  • dbHost—The database host name.

  • dbPort—The database port.

  • dbInstance—The database instance.

repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN.

document

A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source.
primaryKey Provide the primary key for the database table. This identifies a table within your database.
titleColumn Provide the name of the document title column within your database table.
bodyColumn Provide the name of the document title column within your database table.
sqlQuery Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
timestampColumn Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content.
timestampFormat Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.
timezone Enter the name of the column which contains time zones for the content to be crawled.
changeDetectingColumns Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns
allowedUsersColumns Enter the name of the column which contains User IDs to be allowed access to content.
allowedGroupsColumn Enter the name of the column which contains User IDs to be allowed access to content.
sourceURIColumn Enter the name of the column which contains Source URLs to be indexed.
isSslEnabled Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
type The type of data source. Specify JDBC as your data source type.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretArn The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys:
{ "user name": "database user name", "password": "password" }
version The version of the template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "dbType": { "type": "string", "enum": [ "mysql", "db2", "postgresql", "oracle", "sqlserver" ] }, "dbHost": { "type": "string" }, "dbPort": { "type": "string" }, "dbInstance": { "type": "string" } }, "required": [ "dbType", "dbHost", "dbPort", "dbInstance" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "document": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string" }, "dataSourceFieldName": { "type": "string" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } }, "required": [ ] }, "additionalProperties": { "type": "object", "properties": { "primaryKey": { "type": "string" }, "titleColumn": { "type": "string" }, "bodyColumn": { "type": "string" }, "sqlQuery": { "type": "string", "not": { "pattern": ";+" } }, "timestampColumn": { "type": "string" }, "timestampFormat": { "type": "string" }, "timezone": { "type": "string" }, "changeDetectingColumns": { "type": "array", "items": { "type": "string" } }, "allowedUsersColumn": { "type": "string" }, "allowedGroupsColumn": { "type": "string" }, "sourceURIColumn": { "type": "string" }, "isSslEnabled": { "type": "boolean" } }, "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"] }, "type" : { "type" : "string", "pattern": "JDBC" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string" } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Microsoft Teams template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the tenant ID as a part of the connection configuration or repository endpoint details. Also specify the type of data source as MSTEAMS, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Microsoft Teams JSON schema.

The following table describes the parameters of the Microsoft Teams JSON schema.

Configuration Description
connectionConfiguration Configuration information for endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
tenantId The Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory Portal or in your OAuth application.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • chatMessage

  • chatAttachment

  • channelPost

  • channelWiki

  • channelAttachment

  • meetingChat

  • meetingFile

  • meetingNote

  • calendarMeeting

A list of objects that map the attributes or field names of your Microsoft Teams content to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source.
paymentModel Specifies what type of payment model to use with your Microsoft Teams data source. Model A payment models are restricted to licensing and payment models that require security compliance. Model B payment models are suitable for licensing and payment models that do not require security compliance.
  • inclusionTeamNameFilter

  • inclusionChannelNameFilter

  • inclusionFileNamePatterns

  • inclusionFileTypePatterns

  • inclusionUserEmailFilter

  • inclusionOneNoteSectionNamePatterns

  • inclusionOneNotePageNamePatterns

A list of regular expression patterns to include certain content in your Microsoft Teams data source. Content that matches the patterns are included in the index. Content that doesn't match the patterns are excluded from the index. If content matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index.
  • exclusionTeamNameFilter

  • exclusionChannelNameFilter

  • exclusionFileNamePatterns

  • exclusionFileTypePatterns

  • exclusionUserEmailFilter

  • exclusionOneNoteSectionNamePatterns

  • exclusionOneNotePageNamePatterns

A list of regular expression patterns to exclude certain content in your Microsoft Teams data source. Content that matches the patterns are excluded from the index. Content that doesn't match the patterns are included in the index. If content matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index.
  • isCrawlChatMessage

  • isCrawlChatAttachment

  • isCrawlChannelPost

  • isCrawlChannelAttachment

  • isCrawlChannelWiki

  • isCrawlCalendarMeeting

  • isCrawlMeetingChat

  • isCrawlMeetingFile

  • isCrawlMeetingNote

true to crawl these types of content in your Microsoft Teams data source.
startCalendarDateTime You can configure a specific start date-time for your calendar content.
endCalendarDateTime You can configure a specific end date-time for calendar content.
type The type of data source. Specify MSTEAMS as your data source type.
enableIdentityCrawler true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the PutPrincipalMapping API to upload user and group access information.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretArn The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Microsoft Teams. This includes your client ID and client secret that is generated when you create an OAuth application in the Azure portal.
version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "tenantId": { "type": "string", "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", "minLength": 36, "maxLength": 36 } }, "required": [ "tenantId" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "chatMessage": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "chatAttachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "channelPost": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "channelWiki": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "channelAttachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "meetingChat": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "meetingFile": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "meetingNote": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "calendarMeeting": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "paymentModel": { "type": "string", "enum": [ "A", "B", "Evaluation Mode" ] }, "inclusionTeamNameFilter": { "type": "array", "items": { "type": "string" } }, "exclusionTeamNameFilter": { "type": "array", "items": { "type": "string" } }, "inclusionChannelNameFilter": { "type": "array", "items": { "type": "string" } }, "exclusionChannelNameFilter": { "type": "array", "items": { "type": "string" } }, "inclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionUserEmailFilter": { "type": "array", "items": { "type": "string" } }, "inclusionOneNoteSectionNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionOneNoteSectionNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionOneNotePageNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionOneNotePageNamePatterns": { "type": "array", "items": { "type": "string" } }, "isCrawlChatMessage": { "type": "boolean" }, "isCrawlChatAttachment": { "type": "boolean" }, "isCrawlChannelPost": { "type": "boolean" }, "isCrawlChannelAttachment": { "type": "boolean" }, "isCrawlChannelWiki": { "type": "boolean" }, "isCrawlCalendarMeeting": { "type": "boolean" }, "isCrawlMeetingChat": { "type": "boolean" }, "isCrawlMeetingFile": { "type": "boolean" }, "isCrawlMeetingNote": { "type": "boolean" }, "startCalendarDateTime": { "anyOf": [ { "type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$" }, { "type": "string", "pattern": "" } ] }, "endCalendarDateTime": { "anyOf": [ { "type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$" }, { "type": "string", "pattern": "" } ] } }, "required": [] }, "type": { "type": "string", "pattern": "MSTEAMS" }, "enableIdentityCrawler": { "type": "boolean" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Microsoft Yammer template schema

You include a JSON that contains the data source schema as part of TemplateConfiguration object. Specify the type of data source as YAMMER, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide.

The following table describes the parameters of the Microsoft Yammer JSON schema.

Configuration Description
connectionConfiguration Configuration information for the data source.
repositoryEndpointMetadata The endpoint information for the data source. This data source does not specify an endpoint in repositoryEndpointMetadata. Rather, the connection information is included in an AWS Secrets Manager secret that you provide the secretArn.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • community

  • user

  • message

  • attachment

A list of objects that map attributes or field names of Microsoft Yammer content to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source
inclusionPatterns A list of regular expression patterns to include certain files in your Microsoft Yammer data source. Files that match the patterns are included in the index. File that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
exclusionPatterns A list of regular expression patterns to exclude certain files in your Microsoft Yammer data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
sinceDate You can choose to configure a sinceDate parameter so that the Microsoft Yammer connector crawls content based on a specific sinceDate.
communityNameFilter You can choose to index specific community content.
  • isCrawlMessage

  • isCrawlAttachment

  • isCrawlPrivateMessage

true to crawl messages, message attachments, and private messages.
type Specify YAMMER as your data source type.
secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Microsoft Yammer. This includes your Microsoft Yammer user name and password, and client ID and client secret that is generated when you create an OAuth application in the Azure portal.
useChangeLog true to use the Microsoft Yammer change log to determine which documents require updating in the index.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

enableIdentityCrawler true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the PutPrincipalMapping API to upload user and group access information.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { } } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "community": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "user": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "message": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "attachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "inclusionPatterns": { "type": "array" }, "exclusionPatterns": { "type": "array" }, "sinceDate": { "type": "string", "pattern": "^(19|2[0-9])[0-9]{2}-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01])T(0[0-9]|1[0-9]|2[0-3]):([0-5][0-9]):([0-5][0-9])((\\+|-)(0[0-9]|1[0-9]|2[0-3]):([0-5][0-9]))?$" }, "communityNameFilter": { "type": "array", "items": { "type": "string" } }, "isCrawlMessage": { "type": "boolean" }, "isCrawlAttachment": { "type": "boolean" }, "isCrawlPrivateMessage": { "type": "boolean" } }, "required": [ "sinceDate" ] }, "type": { "type": "string", "pattern": "YAMMER" }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 }, "useChangeLog": { "type": "string", "enum": [ "true", "false" ] }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "enableIdentityCrawler": { "type": "boolean" }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] } }, "required": [ "connectionConfiguration", "repositoryConfigurations", "additionalProperties", "type", "secretArn", "syncMode" ] }

MySQL template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. Specify the type of data source as JDBC, the database type as mysql, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See MySQL JSON schema.

The following table describes the parameters of the MySQL JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata Required configuration information for connecting your data source.
  • dbType—The type of Java database that you use, whether mysql, db2, postgresql, oracle, or sqlserver.

  • dbHost—The database host name.

  • dbPort—The database port.

  • dbInstance—The database instance.

repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN.

document

A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source.
primaryKey Provide the primary key for the database table. This identifies a table within your database.
titleColumn Provide the name of the document title column within your database table.
bodyColumn Provide the name of the document title column within your database table.
sqlQuery Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
timestampColumn Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content.
timestampFormat Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.
timezone Enter the name of the column which contains time zones for the content to be crawled.
changeDetectingColumns Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns
allowedUsersColumns Enter the name of the column which contains User IDs to be allowed access to content.
allowedGroupsColumn Enter the name of the column which contains User IDs to be allowed access to content.
sourceURIColumn Enter the name of the column which contains Source URLs to be indexed.
isSslEnabled Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
type The type of data source. Specify JDBC as your data source type.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretArn The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys:
{ "user name": "database user name", "password": "password" }
version The version of the template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "dbType": { "type": "string", "enum": [ "mysql", "db2", "postgresql", "oracle", "sqlserver" ] }, "dbHost": { "type": "string" }, "dbPort": { "type": "string" }, "dbInstance": { "type": "string" } }, "required": [ "dbType", "dbHost", "dbPort", "dbInstance" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "document": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string" }, "dataSourceFieldName": { "type": "string" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } }, "required": [ ] }, "additionalProperties": { "type": "object", "properties": { "primaryKey": { "type": "string" }, "titleColumn": { "type": "string" }, "bodyColumn": { "type": "string" }, "sqlQuery": { "type": "string", "not": { "pattern": ";+" } }, "timestampColumn": { "type": "string" }, "timestampFormat": { "type": "string" }, "timezone": { "type": "string" }, "changeDetectingColumns": { "type": "array", "items": { "type": "string" } }, "allowedUsersColumn": { "type": "string" }, "allowedGroupsColumn": { "type": "string" }, "sourceURIColumn": { "type": "string" }, "isSslEnabled": { "type": "boolean" } }, "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"] }, "type" : { "type" : "string", "pattern": "JDBC" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string" } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Oracle Database template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. Specify the type of data source as JDBC, the database type as oracle, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Oracle Database JSON schema.

The following table describes the parameters of the Oracle Database JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata Required configuration information for connecting your data source.
  • dbType—The type of Java database that you use, whether mysql, db2, postgresql, oracle, or sqlserver.

  • dbHost—The database host name.

  • dbPort—The database port.

  • dbInstance—The database instance.

repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN.

document

A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source.
primaryKey Provide the primary key for the database table. This identifies a table within your database.
titleColumn Provide the name of the document title column within your database table.
bodyColumn Provide the name of the document title column within your database table.
sqlQuery Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
timestampColumn Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content.
timestampFormat Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.
timezone Enter the name of the column which contains time zones for the content to be crawled.
changeDetectingColumns Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns
allowedUsersColumns Enter the name of the column which contains User IDs to be allowed access to content.
allowedGroupsColumn Enter the name of the column which contains User IDs to be allowed access to content.
sourceURIColumn Enter the name of the column which contains Source URLs to be indexed.
isSslEnabled Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
type The type of data source. Specify JDBC as your data source type.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretArn The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys:
{ "user name": "database user name", "password": "password" }
version The version of the template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "dbType": { "type": "string", "enum": [ "mysql", "db2", "postgresql", "oracle", "sqlserver" ] }, "dbHost": { "type": "string" }, "dbPort": { "type": "string" }, "dbInstance": { "type": "string" } }, "required": [ "dbType", "dbHost", "dbPort", "dbInstance" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "document": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string" }, "dataSourceFieldName": { "type": "string" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } }, "required": [ ] }, "additionalProperties": { "type": "object", "properties": { "primaryKey": { "type": "string" }, "titleColumn": { "type": "string" }, "bodyColumn": { "type": "string" }, "sqlQuery": { "type": "string", "not": { "pattern": ";+" } }, "timestampColumn": { "type": "string" }, "timestampFormat": { "type": "string" }, "timezone": { "type": "string" }, "changeDetectingColumns": { "type": "array", "items": { "type": "string" } }, "allowedUsersColumn": { "type": "string" }, "allowedGroupsColumn": { "type": "string" }, "sourceURIColumn": { "type": "string" }, "isSslEnabled": { "type": "boolean" } }, "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"] }, "type" : { "type" : "string", "pattern": "JDBC" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string" } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

PostgreSQL template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. Specify the type of data source as JDBC, the database type as postgresql, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See PostgreSQL JSON schema.

The following table describes the parameters of the PostgreSQL JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata Required configuration information for connecting your data source.
  • dbType—The type of Java database that you use, whether mysql, db2, postgresql, oracle, or sqlserver.

  • dbHost—The database host name.

  • dbPort—The database port.

  • dbInstance—The database instance.

repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings. Specify the type of data source and the secret ARN.

document

A list of objects that map the attributes or field names of your database content to Amazon Kendra index field names. For more information, see Mapping data source fields.
additionalProperties Additional configuration options for your content in your data source. Use to include or exclude specific content in your database data source.
primaryKey Provide the primary key for the database table. This identifies a table within your database.
titleColumn Provide the name of the document title column within your database table.
bodyColumn Provide the name of the document title column within your database table.
sqlQuery Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
timestampColumn Enter the name of the column which contains time stamps. Amazon Kendra uses time stamp information to detect changes in your content and sync only changed content.
timestampFormat Enter the name of the column which contains time stamp formats to use to detect content changes and re-sync your content.
timezone Enter the name of the column which contains time zones for the content to be crawled.
changeDetectingColumns Enter the names of the columns that Amazon Kendra will use to detect content changes. Amazon Kendra will re-index content when there is a change in any of these columns
allowedUsersColumns Enter the name of the column which contains User IDs to be allowed access to content.
allowedGroupsColumn Enter the name of the column which contains User IDs to be allowed access to content.
sourceURIColumn Enter the name of the column which contains Source URLs to be indexed.
isSslEnabled Enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 32KB. Amazon Kendra will crawl all database content that matches your query.
type The type of data source. Specify JDBC as your data source type.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretArn The Amazon Resource Name (ARN) of a Secrets Manager secret that contains user name and password required to connect to your database. The secret must contain a JSON structure with the following keys:
{ "user name": "database user name", "password": "password" }
version The version of the template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "dbType": { "type": "string", "enum": [ "mysql", "db2", "postgresql", "oracle", "sqlserver" ] }, "dbHost": { "type": "string" }, "dbPort": { "type": "string" }, "dbInstance": { "type": "string" } }, "required": [ "dbType", "dbHost", "dbPort", "dbInstance" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "document": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string" }, "dataSourceFieldName": { "type": "string" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } }, "required": [ ] }, "additionalProperties": { "type": "object", "properties": { "primaryKey": { "type": "string" }, "titleColumn": { "type": "string" }, "bodyColumn": { "type": "string" }, "sqlQuery": { "type": "string", "not": { "pattern": ";+" } }, "timestampColumn": { "type": "string" }, "timestampFormat": { "type": "string" }, "timezone": { "type": "string" }, "changeDetectingColumns": { "type": "array", "items": { "type": "string" } }, "allowedUsersColumn": { "type": "string" }, "allowedGroupsColumn": { "type": "string" }, "sourceURIColumn": { "type": "string" }, "isSslEnabled": { "type": "boolean" } }, "required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"] }, "type" : { "type" : "string", "pattern": "JDBC" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string" } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Salesforce template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the Salesforce host URL as a part of the connection configuration or repository endpoint details. Also specify the type of data source as SALESFORCEV2, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Salesforce JSON schema.

The following table describes the parameters of the Salesforce JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
hostUrl The URL of the Salesforce instance to be indexed.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • account

  • contact

  • campaign

  • case

  • product

  • lead

  • contract

  • partner

  • profile

  • idea

  • pricebook

  • task

  • solution

  • attachment

  • user

  • document

  • knowledgeArticles

  • group

  • opportunity

  • chatter

  • customEntity

A list of objects that map the attributes or field names of your Salesforce entities to Amazon Kendra index field names. For more information, see Mapping data source fields.
secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Salesforce. The secret must contain a JSON structure with the following keys:
{ "authenticationUrl": "OAUTH endpoint that Amazon Kendra connects to get an OAUTH token", "consumerKey": "Application public key generated when you created your Salesforce application", "consumerSecret": "Application private key generated when you created your Salesforce application", "password": "Password associated with the user logging in to the Salesforce instance", "securityToken": "Token associated with the user account logging in to the Salesforce instance", "username": "User name of the user logging in to the Salesforce instance" }
additionalProperties Additional configuration options for your content in your data source
  • accountFilter

  • contactFilter

  • caseFilter

  • campaignFilter

  • contractFilter

  • groupFilter

  • leadFilter

  • productFilter

  • opportunityFilter

  • partnerFilter

  • pricebookFilter

  • ideaFilter

  • profileFilter

  • taskFilter

  • solutionFilter

  • userFilter

  • chatterFilter

  • documentFilter

  • knowledgeArticleFilter

  • customEntities

A collection of strings that specifies which entities to filter.

inclusionPatterns

  • inclusionDocumentFileTypePatterns

  • inclusionDocumentFileNamePatterns

  • inclusionAccountFileTypePatterns

  • inclusionCampaignFileTypePatterns

  • inclusionDocumentFileNamePatterns

  • inclusionCampaignFileNamePatterns

  • inclusionCaseFileTypePatterns

  • inclusionCaseFileNamePatterns

  • inclusionContactFileTypePatterns

  • inclusionContractFileNamePatterns

  • inclusionLeadFileTypePatterns

  • inclusionLeadFileNamePatterns

  • inclusionOpportunityFileTypePatterns

  • inclusionOpportunityFileNamePatterns

  • inclusionSolutionFileTypePatterns

  • inclusionSolutionFileNamePatterns

  • inclusionTaskFileTypePatterns

  • inclusionTaskFileNamePatterns

  • inclusionGroupFileTypePatterns

  • inclusionGroupFileNamePatterns

  • inclusionChatterFileTypePatterns

  • inclusionChatterFileNamePatterns

  • inclusionCustomEntityFileTypePatterns

  • inclusionCustomEntityFileNamePatterns

A list of regular expression patterns to include certain files in your Salesforce data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.

exclusionPatterns

  • exclusionDocumentFileTypePatterns

  • exclusionDocumentFileNamePatterns

  • exclusionAccountFileTypePatterns

  • exclusionCampaignFileTypePatterns

  • exclusionCampaignFileNamePatterns

  • exclusionCaseFileTypePatterns

  • exclusionCaseFileNamePatterns

  • exclusionContactFileTypePatterns

  • exclusionContractFileNamePatterns

  • exclusionLeadFileTypePatterns

  • exclusionLeadFileNamePatterns

  • exclusionOpportunityFileTypePatterns

  • exclusionOpportunityFileNamePatterns

  • exclusionSolutionFileTypePatterns

  • exclusionSolutionFileNamePatterns

  • exclusionTaskFileTypePatterns

  • exclusionTaskFileNamePatterns

  • exclusionGroupFileTypePatterns

  • exclusionGroupFileNamePatterns

  • exclusionChatterFileTypePatterns

  • exclusionChatterFileNamePatterns

  • exclusionCustomEntityFileTypePatterns

  • exclusionCustomEntityFileNamePatterns

A list of regular expression patterns to exclude certain files in your Salesforce data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
  • isCrawlAccount

  • isCrawlContact

  • isCrawlCase

  • isCrawlCampaign

  • isCrawlProduct

  • isCrawlLead

  • isCrawlContract

  • isCrawlPartner

  • isCrawlProfile

  • isCrawlIdea

  • isCrawlPricebook

  • isCrawlDocument

  • crawlSharedDocument

  • isCrawlGroup

  • isCrawlOpportunity

  • isCrawlChatter

  • isCrawlUser

  • isCrawlSolution

  • isCrawlTask

  • isCrawlAccountAttachments

  • isCrawlContactAttachments

  • isCrawlCaseAttachments

  • isCrawlCampaignAttachments

  • isCrawlLeadAttachments

  • isCrawlContractAttachments

  • isCrawlGroupAttachments

  • isCrawlOpportunityAttachments

  • isCrawlChatterAttachments

  • isCrawlSolutionAttachments

  • isCrawlTaskAttachments

  • isCrawlCustomEntityAttachments

  • isCrawlKnowledgeArticles

    • isCrawlDraft

    • isCrawlPublish

    • isCrawlArchived

true to crawl these types of files in your Salesforce account.
type The type of data source. Specify SALESFORCEV2 as your data source type.
enableIdentityCrawler true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the PutPrincipalMapping API to upload user and group access information.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

version The version of this template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "hostUrl": { "type": "string", "pattern": "https:.*" } }, "required": [ "hostUrl" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "account": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "contact": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "campaign": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "case": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "product": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "lead": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "contract": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "partner": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "profile": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "idea": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "pricebook": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "task": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "solution": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "attachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "user": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "document": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "knowledgeArticles": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "group": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "opportunity": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE", "LONG" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "chatter": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "customEntity": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "STRING_LIST", "DATE" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "accountFilter":{ "type": "array", "items": { "type": "string" } }, "contactFilter":{ "type": "array", "items": { "type": "string" } }, "caseFilter":{ "type": "array", "items": { "type": "string" } }, "campaignFilter":{ "type": "array", "items": { "type": "string" } }, "contractFilter":{ "type": "array", "items": { "type": "string" } }, "groupFilter":{ "type": "array", "items": { "type": "string" } }, "leadFilter":{ "type": "array", "items": { "type": "string" } }, "productFilter":{ "type": "array", "items": { "type": "string" } }, "opportunityFilter":{ "type": "array", "items": { "type": "string" } }, "partnerFilter":{ "type": "array", "items": { "type": "string" } }, "pricebookFilter":{ "type": "array", "items": { "type": "string" } }, "ideaFilter":{ "type": "array", "items": { "type": "string" } }, "profileFilter":{ "type": "array", "items": { "type": "string" } }, "taskFilter":{ "type": "array", "items": { "type": "string" } }, "solutionFilter":{ "type": "array", "items": { "type": "string" } }, "userFilter":{ "type": "array", "items": { "type": "string" } }, "chatterFilter":{ "type": "array", "items": { "type": "string" } }, "documentFilter":{ "type": "array", "items": { "type": "string" } }, "knowledgeArticleFilter":{ "type": "array", "items": { "type": "string" } }, "customEntities":{ "type": "array", "items": { "type": "string" } }, "isCrawlAccount": { "type": "boolean" }, "isCrawlContact": { "type": "boolean" }, "isCrawlCase": { "type": "boolean" }, "isCrawlCampaign": { "type": "boolean" }, "isCrawlProduct": { "type": "boolean" }, "isCrawlLead": { "type": "boolean" }, "isCrawlContract": { "type": "boolean" }, "isCrawlPartner": { "type": "boolean" }, "isCrawlProfile": { "type": "boolean" }, "isCrawlIdea": { "type": "boolean" }, "isCrawlPricebook": { "type": "boolean" }, "isCrawlDocument": { "type": "boolean" }, "crawlSharedDocument": { "type": "boolean" }, "isCrawlGroup": { "type": "boolean" }, "isCrawlOpportunity": { "type": "boolean" }, "isCrawlChatter": { "type": "boolean" }, "isCrawlUser": { "type": "boolean" }, "isCrawlSolution":{ "type": "boolean" }, "isCrawlTask":{ "type": "boolean" }, "isCrawlAccountAttachments": { "type": "boolean" }, "isCrawlContactAttachments": { "type": "boolean" }, "isCrawlCaseAttachments": { "type": "boolean" }, "isCrawlCampaignAttachments": { "type": "boolean" }, "isCrawlLeadAttachments": { "type": "boolean" }, "isCrawlContractAttachments": { "type": "boolean" }, "isCrawlGroupAttachments": { "type": "boolean" }, "isCrawlOpportunityAttachments": { "type": "boolean" }, "isCrawlChatterAttachments": { "type": "boolean" }, "isCrawlSolutionAttachments":{ "type": "boolean" }, "isCrawlTaskAttachments":{ "type": "boolean" }, "isCrawlCustomEntityAttachments":{ "type": "boolean" }, "isCrawlKnowledgeArticles": { "type": "object", "properties": { "isCrawlDraft": { "type": "boolean" }, "isCrawlPublish": { "type": "boolean" }, "isCrawlArchived": { "type": "boolean" } } }, "inclusionDocumentFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionDocumentFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionDocumentFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionDocumentFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionAccountFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionAccountFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionAccountFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionAccountFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionCampaignFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionCampaignFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionCampaignFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionCampaignFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionCaseFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionCaseFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionCaseFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionCaseFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionContactFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionContactFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionContactFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionContactFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionContractFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionContractFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionContractFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionContractFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionLeadFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionLeadFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionLeadFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionLeadFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionOpportunityFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionOpportunityFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionOpportunityFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionOpportunityFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionSolutionFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionSolutionFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionSolutionFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionSolutionFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionTaskFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionTaskFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionTaskFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionTaskFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionGroupFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionGroupFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionGroupFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionGroupFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionChatterFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionChatterFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionChatterFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionChatterFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionCustomEntityFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionCustomEntityFileTypePatterns":{ "type": "array", "items": { "type": "string" } }, "inclusionCustomEntityFileNamePatterns":{ "type": "array", "items": { "type": "string" } }, "exclusionCustomEntityFileNamePatterns":{ "type": "array", "items": { "type": "string" } } }, "required": [] }, "enableIdentityCrawler": { "type": "boolean" }, "type": { "type": "string", "pattern": "SALESFORCEV2" }, "syncMode": { "type": "string", "enum": [ "FULL_CRAWL", "FORCED_FULL_CRAWL", "CHANGE_LOG" ] }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

ServiceNow template schema

You include a JSON that contains the data source schema as part of the TemplateConfiguration object. You provide the ServiceNow host URL, authentication type, and instance version as a part of the connection configuration or repository endpoint details. Also specify the type of data source as SERVICENOWV2, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See ServiceNow JSON schema.

The following table describes the parameters of the ServiceNow JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
hostUrl The ServiceNow host URL. For example, your-domain.service-now.com.
authType The type of authentication that you use, whether basicAuth or OAuth2.
servicenowInstanceVersion The ServiceNow version that you use. You can choose between Tokyo, Sandiego, Rome, and Others.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • knowledgeArticle

  • attachment

  • serviceCatalog

  • incident

A list of objects that map the attributes or field names of your ServiceNow knowledge articles, attachments, service catalog, and incidents to Amazon Kendra index field names. For more information, see Mapping data source fields. The ServiceNow data source field names must exist in your ServiceNow custom metadata.
additional properties Additional configuration options for your content in your data source.
maxFileSizeInMegaBytes Specify the file size limit in MBs that Amazon Kendra will crawl. Amazon Kendra will crawl only the files within the size limit you define. The default file size is 50MB. The maximum file size should be greater than 0MB and less than or equal to 50MB.
  • knowledgeArticleFilter

  • incidentQueryFilter

  • serviceCatalogQueryFilter

  • knowledgeArticleTitleRegExp

  • serviceCatalogTitleRegExp

  • incidentTitleRegExp

  • inclusionFileTypePatterns

  • exclusionFileTypePatterns

  • inclusionFileNamePatterns

  • exclusionFileNamePatterns

  • incidentStateType

A list of regular expression patterns to include and/or exclude certain files in your ServiceNow data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence and the file isn't included in the index.
  • isCrawlKnowledgeArticle

  • isCrawlKnowledgeArticleAttachment

  • includePublicArticlesOnly

  • isCrawlServiceCatalog

  • isCrawlServiceCatalogAttachment

  • isCrawlActiveServiceCatalog

  • isCrawlInactiveServiceCatalog

  • isCrawlIncident

  • isCrawlIncidentAttachment

  • isCrawlActiveIncident

  • isCrawlInactiveIncident

  • applyACLForKnowledgeArticle

  • applyACLForServiceCatalog

  • applyACLForIncident

true to crawl ServiceNow knowledge articles, service catalogs, incidents, and attachments.
type The type of data source. Specify SERVICENOWV2 as your data source type.
enableIdentityCrawler true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the PutPrincipalMapping API to upload user and group access information.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your ServiceNow. The secret must contain a JSON structure with the following keys:
{ "username": "user name", "password": "password" }
If you use OAuth2 authentication, your secret must contain a JSON structure with the following keys:
{ "username": "user name", "password": "password", "clientId": "client id", "clientSecret": "client secret" }
version The version of the template that is currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "hostUrl": { "type": "string", "pattern": "^(?!(^(https?|ftp|file):\/\/))[a-z0-9-]+(.service-now.com|.servicenowservices.com)$", "minLength": 1, "maxLength": 2048 }, "authType": { "type": "string", "enum": [ "basicAuth", "OAuth2" ] }, "servicenowInstanceVersion": { "type": "string", "enum": [ "Tokyo", "Sandiego", "Rome", "Others" ] } }, "required": [ "hostUrl", "authType", "servicenowInstanceVersion" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "knowledgeArticle": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "STRING_LIST" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "attachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "LONG", "DATE", "STRING_LIST" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "serviceCatalog": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "STRING_LIST" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] }, "incident": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": [ "STRING", "DATE", "STRING_LIST" ] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } } }, "additionalProperties": { "type": "object", "properties": { "maxFileSizeInMegaBytes": { "type": "string" }, "isCrawlKnowledgeArticle": { "type": "boolean" }, "isCrawlKnowledgeArticleAttachment": { "type": "boolean" }, "includePublicArticlesOnly": { "type": "boolean" }, "knowledgeArticleFilter": { "type": "string" }, "incidentQueryFilter": { "type": "string" }, "serviceCatalogQueryFilter": { "type": "string" }, "isCrawlServiceCatalog": { "type": "boolean" }, "isCrawlServiceCatalogAttachment": { "type": "boolean" }, "isCrawlActiveServiceCatalog": { "type": "boolean" }, "isCrawlInactiveServiceCatalog": { "type": "boolean" }, "isCrawlIncident": { "type": "boolean" }, "isCrawlIncidentAttachment": { "type": "boolean" }, "isCrawlActiveIncident": { "type": "boolean" }, "isCrawlInactiveIncident": { "type": "boolean" }, "applyACLForKnowledgeArticle": { "type": "boolean" }, "applyACLForServiceCatalog": { "type": "boolean" }, "applyACLForIncident": { "type": "boolean" }, "incidentStateType": { "type": "array", "items": { "type": "string", "enum": [ "Open", "Open - Unassigned", "Resolved", "All" ] } }, "knowledgeArticleTitleRegExp": { "type": "string" }, "serviceCatalogTitleRegExp": { "type": "string" }, "incidentTitleRegExp": { "type": "string" }, "inclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileTypePatterns": { "type": "array", "items": { "type": "string" } }, "inclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } }, "exclusionFileNamePatterns": { "type": "array", "items": { "type": "string" } } }, "required": [] }, "type": { "type": "string", "pattern": "SERVICENOWV2" }, "enableIdentityCrawler": { "type": "boolean" }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL" ] }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type" ] }

Slack template schema

You include a JSON that contains the data source schema as part of TemplateConfiguration object. You provide the host URL as a part of the connection configuration or repository endpoint details. Also specify the type of data source as SLACK, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Slack JSON schema.

The following table describes the parameters of the Slack JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
teamId The Slack team ID you copied from your Slack main page URL.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
All A list of objects that map the attributes or field names of your Slack content to Amazon Kendra index field names.
additionalProperties Additional configuration options for your content in your data source.
inclusionPatterns A list of regular expression patterns to include specific content in your Slack data source. Content that matches the patterns are included in the index. Content that doesn't match the patterns are excluded from the index. If any content matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index.
exclusionPatterns A list of regular expression patterns to exclude specific content in your Slack data source. Content that matches the patterns are excluded from the index. Content that doen't match the patterns are included in the index. If any content matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the content isn't included in the index.
crawlBotMessages true to crawl bot messages.
excludeArchived true to exclude crawling of archived messages.
conversationType The type of conversation that you want to index whether PUBLIC_CHANNEL, PRIVATE_CHANNEL, GROUP_MESSAGE and DIRECT_MESSAGE.
channelFilter The type of channel that you want to index whether private_channel or public_channel.
sinceDate You can choose to configure a sinceDate parameter so that the Slack connector crawls content based on a specific sinceDate.
lookBack You can choose to configure a lookBack parameter so that the Slack connector crawls updated or deleted content upto a specified number of hours before your last connector sync.
syncMode

Specify how Amazon Kendra should update your index when your data source content changes. You can choose between:

  • FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time your data source syncs with your index.

  • FULL_CRAWL to index only new, modified and deleted content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

  • CHANGE_LOG to index only new and modified content each time your data source syncs with your index. Amazon Kendra can use your data source's mechanism for tracking content changes and index content that changed since the last sync.

type The type of data source. Specify SLACK as your data source type.
enableIdentityCrawler true to use Amazon Kendra's identity crawler to sync identity/principal information on users and groups with access to certain documents. If identity crawler is turned off, all documents can be publicly searched. If you want to use access control for your documents and identity crawler is turned off, you can alternatively use the PutPrincipalMapping API to upload user and group access information.
secretArn

The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Slack. The secret must contain a JSON structure with the following keys:

{ "slackToken": "token" }
version The version of this template that's currently supported.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "teamId": { "type": "string" } }, "required": ["teamId"] } } }, "repositoryConfigurations": { "type": "object", "properties": { "All": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "DATE","LONG"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } }, "required": [ "fieldMappings" ] } }, "required": [ ] }, "additionalProperties": { "type": "object", "properties": { "exclusionPatterns": { "type": "array", "items": { "type": "string" } }, "inclusionPatterns": { "type": "array", "items": { "type": "string" } }, "crawlBotMessages": { "type": "boolean" }, "excludeArchived": { "type": "boolean" }, "conversationType": { "type": "array", "items": { "type": "string", "enum": [ "PUBLIC_CHANNEL", "PRIVATE_CHANNEL", "GROUP_MESSAGE", "DIRECT_MESSAGE" ] } }, "channelFilter": { "type": "object", "properties": { "private_channel": { "type": "array", "items": { "type": "string" } }, "public_channel": { "type": "array", "items": { "type": "string" } } } }, "channelIdFilter": { "type": "array", "items": { "type": "string" } }, "sinceDate": { "anyOf": [ { "type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$" }, { "type": "string", "pattern": "" } ] }, "lookBack": { "type": "string", "pattern": "^[0-9]*$" } }, "required": [ ] }, "syncMode": { "type": "string", "enum": [ "FORCED_FULL_CRAWL", "FULL_CRAWL", "CHANGE_LOG" ] }, "type" : { "type" : "string", "pattern": "SLACK" }, "enableIdentityCrawler": { "type": "boolean" }, "secretArn": { "type": "string" } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "required": [ "connectionConfiguration", "repositoryConfigurations", "syncMode", "additionalProperties", "secretArn", "type", "enableIdentityCrawler" ] }

Zendesk template schema

You include a JSON that contains the data source schema as part of TemplateConfiguration object. You provide the host URL as a part of the connection configuration or repository endpoint details. Also specify the type of data source as ZENDESK, a secret for your authentication credentials, and other necessary configurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Zendesk JSON schema.

The following table describes the parameters of the Zendesk JSON schema.

Configuration Description
connectionConfiguration Configuration information for the endpoint for the data source.
repositoryEndpointMetadata The endpoint information for the data source.
hostURL The Zendesk host URL. For example, https://yoursubdomain.zendesk.com.
repositoryConfigurations Configuration information for the content of the data source. For example, configuring specific types of content and field mappings.
  • ticket

  • ticketComment

  • ticketCommentAttachment

  • article

  • articleComment

  • articleAttachment

  • communityTopic

  • communityPostComment

A list of objects that map attributes or field names of Zendesk tickets to Amazon Kendra index field names. For more information, see Mapping data source fields.
secretARN The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the key-value pairs required to connect to your Zendesk. The secret must contain a JSON structure with the following keys: host URL, client ID, client secret, user name, and password.
additionalProperties Additional configuration options for your content in your data source
organizationNameFilter You can choose to index tickets that exist within a specific Organization.
sinceDate You can choose to configure a sinceDate parameter so that the Zendesk connector crawls content based on a specific sinceDate.
inclusionPatterns A list of regular expression patterns to include certain files in your Zendesk data source. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion and exclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
exclusionPatterns A list of regular expression patterns to exclude certain files in your Zendesk data source. Files that match the patterns are excluded from the index. Files that don't match the patterns are included in the index. If a file matches both an exclusion and inclusion pattern, the exclusion pattern takes precedence, and the file isn't included in the index.
  • isCrawlTicket

  • isCrawlTicketComment

  • isCrawlTicketCommentAttachment

  • isCrawlArticle

  • isCrawlArticleComment

  • isCrawlArticleAttachment

  • isCrawlCommunityTopic

  • isCrawlCommunityPost

  • isCrawlCommunityPostComment

Input "true" to crawl these types of content.
type Specify ZENDESK as your data source type.
useChangeLog Input "true" to use the Zendesk change log to determine which documents require updating in the index. Depending on the change log's size, it might be faster to scan the documents in Zendesk. If you are syncing your Zendesk data source with your index for the first time, all documents are scanned.
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "connectionConfiguration": { "type": "object", "properties": { "repositoryEndpointMetadata": { "type": "object", "properties": { "hostUrl": { "type": "string", "pattern": "https:.*" } }, "required": [ "hostUrl" ] } }, "required": [ "repositoryEndpointMetadata" ] }, "repositoryConfigurations": { "type": "object", "properties": { "ticket": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "LONG", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "ticketComment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "LONG", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "ticketCommentAttachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "LONG", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "article": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "LONG", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "communityPostComment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "LONG", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "articleComment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "LONG", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "articleAttachment": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "LONG", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] }, "communityTopic": { "type": "object", "properties": { "fieldMappings": { "type": "array", "items": { "anyOf": [ { "type": "object", "properties": { "indexFieldName": { "type": "string" }, "indexFieldType": { "type": "string", "enum": ["STRING", "STRING_LIST", "LONG", "DATE"] }, "dataSourceFieldName": { "type": "string" }, "dateFieldFormat": { "type": "string", "pattern": "dd-MM-yyyy HH:mm:ss" } }, "required": [ "indexFieldName", "indexFieldType", "dataSourceFieldName" ] } ] } } }, "required": [ "fieldMappings" ] } } }, "secretArn": { "type": "string", "minLength": 20, "maxLength": 2048 }, "additionalProperties": { "type": "object", "properties": { "organizationNameFilter": { "type": "array" }, "sinceDate": { "type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}$" }, "inclusionPatterns": { "type": "array" }, "exclusionPatterns": { "type": "array" }, "isCrawTicket": { "type": "string" }, "isCrawTicketComment": { "type": "string" }, "isCrawTicketCommentAttachment": { "type": "string" }, "isCrawlArticle": { "type": "string" }, "isCrawlArticleAttachment": { "type": "string" }, "isCrawlArticleComment": { "type": "string" }, "isCrawlCommunityTopic": { "type": "string" }, "isCrawlCommunityPost": { "type": "string" }, "isCrawlCommunityPostComment": { "type": "string" } } }, "type": { "type": "string", "pattern": "ZENDESK" }, "useChangeLog": { "type": "string", "enum": ["true", "false"] } }, "version": { "type": "string", "anyOf": [ { "pattern": "1.0.0" } ] }, "additionalProperties": false, "required": [ "connectionConfiguration", "repositoryConfigurations", "additionalProperties", "useChangeLog", "secretArn", "type" ] }