Adding documents from a data source - Amazon Kendra

Adding documents from a data source

When you create a data source, you give Amazon Kendra the location of documents that it should index. Unlike adding documents directly to an index, you can periodically scan the data source to update the index.

For example, say that you have a repository of tax instruction stored in an Amazon S3 bucket. Existing documents are changed and new documents are added to the repository from time to time. If you add the repository to Amazon Kendra as a data source, you can keep your index up to date by periodically updating your index.

You can update the index manually using console or the StartDataSourceSyncJob operation, or you can set up a schedule to update the index.

An index can have more than one data source. Each data source can have its own update schedule. For example, you might update the index of your working documents daily, or even hourly, while updating your archived documents manually whenever the archive changes.

Setting an update schedule

Configure your data source to periodically update with the console or by using the Schedule parameter when you create or update a data source. The content of the parameter is a string that holds either a cron-format schedule string or an empty string to indicate that the index should be updated on demand. For the format of a cron expression, see Schedule Expressions for Rules in the Amazon CloudWatch Events User Guide. Amazon Kendra supports only cron expressions. It does not support rate expressions.

Setting a language

You can index all your documents in a data source in a supported language. You specify the language code for all your documents in your data source when you call CreateDataSource. If a document does not have a language code specified in a metadata field, the document is indexed using the language code specified for all documents at the data source level. If you do not specify a language, Amazon Kendra indexes documents in a data source in English by default. For more information on supported languages, including their codes, see Adding documents in languages other than English.

You index all your documents in a data source in a supported language using the console. Go to Data sources or Add data sources if you are adding a new data source. On the Specify data source details page, choose a language from the dropdown Language. You select Update or continue to enter the configuration information to connect to your data source.