Connecting Amazon Q custom connector to Amazon Q Business - Amazon Q Business

Connecting Amazon Q custom connector to Amazon Q Business

Use a custom data source when you have a repository that Amazon Q Business doesn’t yet provide a data source connector for. When you create a custom data source, you have complete control over how the documents to index are selected. Amazon Q only provides metric information that you can use to monitor your data source sync jobs. You must create and run the crawler that determines the documents your data source indexes.

You can use a custom data source connector to:

  • See the same run history metrics that Amazon Q data sources provide even when you can't use Amazon Q data sources to sync your repositories.

  • Create a consistent sync monitoring experience between Amazon Q data sources and custom data sources.

  • See sync metrics for a data source connector that you created using the BatchPutDocument and BatchDeleteDocument API operations.

You can create an Amazon Q custom data source connector using either the AWS Management Console or the CreateDataSource.

When you create a custom data source using the CreateDataSource API operation:

  • The action returns an ID to use when you synchronize the data source.

  • You have to set the Configuration parameter as the following:

    "configuration": { "type": "CUSTOM", "version": "1.0.0" }
  • You must specify the main title of your documents using the Document object, and _source_uri in DocumentAttribute. The main title is required so that DocumentTitle and DocumentURI are included in the ChatSync or Chat response.

When you create a custom data source using the console:

  • The console returns an ID to use when you synchronize the data source.

  • Give your data source a name, and optionally a description and resource tags.

  • After the data source is created, a data source ID is shown. Copy this ID to use when you synchronize the data source with the index.

Creating an Amazon Q custom connector

To use a custom data source, create an application that is responsible for updating your Amazon Q index. The application depends on a crawler that you create. The crawler reads the documents in your repository and determines which documents should be sent to Amazon Q. Your application should perform the following steps:

  1. Crawl your repository and make a list of the documents in your repository that are added, updated, or deleted.

  2. Call the StartDataSourceSyncJob API operation to signal that a sync job is starting. You provide a data source ID to identify the data source that is synchronizing. Amazon Q returns an execution ID to identify a particular sync job.

    Note

    After you end a sync job, you can start a new sync job. There can be a period of time before all of the submitted documents are added to the index. To see the status of the sync job, use the ListDataSourceSyncJobs operation. If the Status returned for the sync job is SYNCING_INDEXING, some documents are still being indexed. You can start a new sync job when the status of the previous job is FAILED or SUCCEEDED.

  3. To remove documents from the index, use the BatchDeleteDocument operation. You provide the data source ID and execution ID to identify the data source that is synchronizing and the job that this update is associated with.

  4. To signal the end of the sync job, use the StopDataSourceSyncJob operation. After you call the StopDataSourceSyncJob operation, the associated execution ID is no longer valid.

    Note

    After you call the StopDataSourceSyncJob operation, you can't use a sync job identifier in a call to the BatchPutDocument or BatchDeleteDocument operations. If you do, all of the documents submitted are returned in the FailedDocuments response message from the API.

  5. To list the sync jobs for the data source and to see metrics for the sync jobs, use the ListDataSourceSyncJobs operation with the index and data source identifiers.

Required attributes

When you submit a document to Amazon Q using the BatchPutDocument API operation, you must provide the following two attributes for each document:

  • _data_source_id – The identifier of the data source. This is returned when you create the data source with either the console or the CreateDataSource API operation.

  • _data_source_sync_job_execution_id – The identifier of the sync run. This is returned when you start the index synchronization with the StartDataSourceSyncJob operation.

The following is the JSON required to index a document using a custom data source.

{ "Documents": [ { "Attributes": [ { "Key": "_data_source_id", "Value": { "StringValue": "data source identifier" } }, { "Key": "_data_source_sync_job_execution_id", "Value": { "StringValue": "sync job identifier" } } ], "Blob": "document content", "ContentType": "content type", "Id": "document identifier", "Title": "document title" } ], "IndexId": "index identifier", "RoleArn": "IAM role ARN" }

When you remove a document from the index using the BatchDeleteDocument API operation, you must specify the following two fields in the DataSourceSyncJobMetricTarget parameter:

  • DataSourceId – The identifier of the data source. This is returned when you create the data source with either the console or the CreateDataSource API operation.

  • DataSourceSyncJobId – The identifier of the sync run. This is returned when you start the index synchronization with the StartDataSourceSyncJob operation.

The following is the JSON required to delete a document from the index using the BatchDeleteDocument operation.

{ "DataSourceSyncJobMetricTarget": { "DataSourceId": "data source identifier", "DataSourceSyncJobId": "sync job identifier" }, "DocumentIdList": [ "document identifier" ], "IndexId": "index identifier" }

Viewing metrics

After a sync job is finished, you can use the DataSourceSyncJobMetrics API operation to get the metrics associated with the sync job. Use this API operation to monitor your custom data source syncs.

You can submit the same document multiple times, either as part of the BatchPutDocument operation, the BatchDeleteDocument operation, or if the document is submitted for both addition and deletion, Regardless of how you submit the document, it is only counted once in the metrics.

  • DocumentsAdded – The number of documents submitted using the BatchPutDocument operation associated with this sync job that are added to the index for the first time. If a document is submitted for addition more than once in a sync, the document is only counted once in the metrics.

  • DocumentsDeleted – The number of documents submitted using the BatchDeleteDocument operation associated with this sync job that are deleted from the index. If a document is submitted for deletion more than once in a sync, the document is only counted once in the metrics.

  • DocumentsFailed – The number of documents associated with this sync job that failed indexing. These documents were accepted by Amazon Q for indexing but could not be indexed or deleted. If a document isn't accepted by Amazon Q, the identifier for the document is returned in the FailedDocuments response property of the BatchPutDocument and BatchDeleteDocument operations.

  • DocumentsModified – The number of modified documents submitted using the BatchPutDocument operation associated with this sync job that were modified in the Amazon Q index.

Amazon Q also emits Amazon CloudWatch metrics while indexing documents. For more information, see Monitoring Amazon Q with Amazon CloudWatch.

Amazon Q doesn't return the DocumentsScanned metric for custom data sources.