Sample speaker search workflow - Amazon Chime SDK

Sample speaker search workflow

Important

The speaker search function involves the creation of a voice embedding, which can be used compare the voice of a caller against previously stored voice data. The collection, use, storage, and retention of biometric identifiers and biometric information in the form of a digital voiceprint may require the caller's informed consent via a written release. Such consent is required under various state laws, including biometrics laws in Illinois, Texas, Washington and other state privacy laws. Before using the speaker search feature, you must provide all notices, and obtain all consents as required by applicable law, and under the AWS service terms governing your use of the feature.

The following diagram shows an example data flow through a speaker search analysis task. Numbered text below the image describe each step of the process.

Note

The diagram assumes you have already configured an Amazon Chime SDK Voice Connector with a call analytics configuration that has a VoiceAnalyticsProcessor. For more information, see Recording Voice Connector calls.

A diagram showing the data flow through a speaker search analysis.

In the diagram:

  1. You or a system administrator create a voice profile domain for storing voice embeddings and voice profiles. For more information about creating voice profile domains, see Creating voice profile domains, in the Amazon Chime SDK Administrator Guide. You can also use the CreateVoiceProfileDomain API.

  2. A caller dials in using a phone number assigned to an Amazon Chime SDK Voice Connector. Or, an agent uses a Voice Connector number to make an outbound call.

  3. The Amazon Chime SDK Voice Connector service creates a transaction ID and associates it with the call.

  4. Assuming your application subscribes to EventBridge events, your application calls the CreateMediaInsightsPipeline API with the with the media insights pipeline configuration and Kinesis Video Stream ARNs for the Voice Connector call.

    For more information about using EventBridge, refer to Workflows for machine-learning based analytics.

  5. Your application—such as an Interactive Voice Response system—or agent provides notice to the caller regarding call recording and the use of voice embeddings for voice analytics and seeks their consent to participate.

  6. Once the caller provides consent, your application or agent can call the StartSpeakerSearchTask API through the Voice SDK if you have a Voice Connector and a transaction ID. Or, if you have a media insights pipeline ID instead of a transaction ID, you call the StartSpeakerSearchTask API in the Media pipelines SDK.

    Once the caller provides consent, your application or agent calls the StartSpeakerSearchTask API. You must pass the Voice Connector ID, transaction ID, and voice profile domain ID to the API. A speaker search task ID is returned to identify the asynchronous task.

    Note

    Before invoking the StartSpeakerSearchTask API in either of the SDKs, you must provide any necessary notices, and obtain any necessary consents, as required by law and under the AWS service terms.

  7. The system accumulates 10 seconds of the caller's voice. The caller must speak for at least that amount of time. The system doesn't capture or analyze silence.

  8. The media insights pipeline compares the speech to the voice profiles in the domain and lists top 10 high confidence matches. If it doesn't find a match, the Voice Connector creates a voice profile.

  9. The media insights pipeline service sends a notification event to the configured notification targets.

  10. The caller continues speaking and provides an additional 10 seconds of non-silence speech.

  11. The media insights pipeline generates an enrollment voice embedding that you can use to create a voice profile or update an existing voice profile.

  12. The media insights pipeline sends a VoiceprintGenerationSuccessful notification to the configured notification targets.

  13. Your application calls the CreateVoiceProfile or UpdateVoiceProfile APIs to create or update the profile.

  14. Your application calls the GetSpeakerSearchTask API as needed to get the latest status of the speaker search task.