To use a custom data source, create an application environment that is responsible for updating your Amazon Q index. The application environment depends on a crawler that you create. The crawler reads the documents in your repository and determines which documents should be sent to Amazon Q. Your application environment should perform the following steps:
-
Crawl your repository and make a list of the documents in your repository that are added, updated, or deleted.
-
Call the StartDataSourceSyncJob API operation to signal that a sync job is starting. You provide a data source ID to identify the data source that is synchronizing. Amazon Q returns an execution ID to identify a particular sync job.
Note
After you end a sync job, you can start a new sync job. There can be a period of time before all of the submitted documents are added to the index. To see the status of the sync job, use the ListDataSourceSyncJobs operation. If the
Status
returned for the sync job isSYNCING_INDEXING
, some documents are still being indexed. You can start a new sync job when the status of the previous job isFAILED
orSUCCEEDED
. -
To remove documents from the index, use the BatchDeleteDocument operation. You provide the data source ID and execution ID to identify the data source that is synchronizing and the job that this update is associated with.
-
To signal the end of the sync job, use the StopDataSourceSyncJob operation. After you call the
StopDataSourceSyncJob
operation, the associated execution ID is no longer valid.Note
After you call the
StopDataSourceSyncJob
operation, you can't use a sync job identifier in a call to theBatchPutDocument
orBatchDeleteDocument
operations. If you do, all of the documents submitted are returned in theFailedDocuments
response message from the API. -
To list the sync jobs for the data source and to see metrics for the sync jobs, use the ListDataSourceSyncJobs operation with the index and data source identifiers.