音声文字起こし信頼度スコアの使用

ユーザーが音声発話を行うと、Amazon Lex V2 は自動音声認識 (ASR) を使用して、解釈される前にユーザーの要求を書き起こします。Amazon Lex V2 はデフォルトで、最も可能性の高い音声文字起こしを使用して解釈します。

場合によっては、音声文字起こしが複数あることもあります。たとえば、ユーザーが「マイ・ネーム・イズ・ジョン」が「マイ・ネーム・イズ・フアン」と聞こえるような曖昧な発話をする場合があります。このような場合は、曖昧性解消の手法を使用するか、ドメインの知識と文字起こし信頼度スコアを組み合わせて、文字起こしリスト内のどの文字起こしが正しいかを判断できます。

Amazon Lex V2 には、Lambda コードフック関数へのリクエストのユーザー入力用に、一番上の文字起こしと最大 2 つの代替文字起こしが含まれています。各文字起こしには、それが正しい文字起こしであることを示す信頼度スコアが含まれています。各トランスクリプションには、ユーザー入力から推測されるスロット値も含まれます。

2 つのトランスクリプションの信頼スコアを比較して、両者の間にあいまいさがあるかどうかを判断できます。例えば、あるトランスクリプションの信頼スコアが 0.95 で、別のトランスクリプションの信頼スコアが 0.65 の場合、最初のトランスクリプションはおそらく正しく、それらの間のあいまいさは低いでしょう。2 つの文字起こしの信頼スコアが 0.75 と 0.72 であれば、両者のあいまいさは高いと言えます。ドメインの知識があれば区別できる可能性はあります。

たとえば、信頼スコアが 0.75 と 0.72 の 2 つの文字起こしの推定スロット値が「ジョン」と「ホアン」の場合、データベース内のユーザーにこれらの名前の存在を問い合わせて、文字起こしの 1 つを削除できます。「ジョン」がデータベース内のユーザーではなく、「ホアン」がユーザーである場合は、ダイアログコードフックを使用して、ファーストネームの推定スロット値を「ホアン」に変更できます。

Amazon Lex V2 が返す信頼スコアは、比較のための値です。絶対的なスコアとして信頼するべきではありません。この値は、Amazon Lex V2 の改善に基づいて変更される場合があります。

音声文字起こしの信頼度スコアは、英語 (GB) (en_GB) と英語 (米国) (en_US) の言語でのみ利用できます。信頼度スコアは 8 kHz 音声入力でのみサポートされます。Amazon Lex V2 コンソールのテストウィンドウからの音声入力では 16 kHz の音声入力が使用されるため、文字起こし信頼性スコアは提供されません。

注記

既存のボットで音声文字変換信頼度スコアを使用するには、まずボットを再構築する必要があります。既存のバージョンのボットは、文字起こし信頼度スコアをサポートしていません。それらを使用するには、ボットの新しいバージョンを作成する必要があります。

信頼度スコアは複数の会話デザインパターンに使用できます。

騒がしい環境や信号品質が悪いために最も高い信頼スコアがしきい値を下回った場合は、ユーザーに同じ質問をしてより高品質の音声をキャプチャするように促すことができます。
「ジョン」と「ホアン」のように、複数のトランスクリプションのスロット値の信頼スコアが類似している場合は、値を既存のデータベースと比較して入力を省くか、2 つの値のうちの 1 つを選択するようユーザーに促すことができます。たとえば、「ジョンなら 1、ホアンなら 2 と言ってください。」と設定します。
ビジネスロジックで、信頼スコアが最上位の文字起こしに近い代替文字起こしの特定のキーワードに基づいてインテントを切り替える必要がある場合は、ダイアログコードフックの Lambda 関数を使用するか、セッション管理オペレーションを使用してインテントを変更できます。詳細については、「セッション管理」を参照してください。

Amazon Lex V2 は、Lambda コードフック関数へのユーザーの入力について、最大 3 つの文字起こしを含む次の JSON 構造を送信します。



    "transcriptions": [
        {
            "transcription": "string",
            "rawTranscription": "string",
            "transcriptionConfidence": "number",
            },
            "resolvedContext": {
                "intent": "string"
            },
            "resolvedSlots": {
                "string": {
                    "shape": "List",
                    "value": {
                        "originalValue": "string",
                        "resolvedValues": [
                            "string"
                        ]
                    },
                    "values": [
                        {
                            "shape": "Scalar",
                            "value": {
                                "originalValue": "string",
                                "resolvedValues": [
                                    "string"
                                ]
                            }
                        },
                        {
                            "shape": "Scalar",
                            "value": {
                                "originalValue": "string",
                                "resolvedValues": [
                                    "string"
                                ]
                            }
                        }
                    ]
                }
            }
        }
    ]

JSON 構造には、文字起こしテキスト、発話で解決されたインテント、および発話で検出されたスロットの値が含まれます。テキストユーザー入力の場合、文字起こしには信頼スコアが 1.0 の文字起こしが 1 つ含まれます。

文字起こしの内容は、会話の順番や認識されたインテントによって異なります。

最初のターン、インテント誘発では、Amazon Lex V2 が文字起こしの上位 3 つを決定します。一番上の文字起こしでは、インテント、および文字起こし内の推定スロット値が返されます。

次のターン、つまりスロット誘発では、結果は各文字起こしの推定インテントによって次のように異なります。

一番上の文字起こしの推定インテントが前のターンと同じで、他のすべての文字起こしが同じインテントである場合、
- すべての文字起こしには推定されたスロット値が含まれます。
一番上の文字起こしの推定インテントが前のターンと異なり、他のすべての文字起こしに前のインテントがある場合、
- 一番上の文字起こしには、新しいインテントの推定スロット値が含まれます。
- 他の文字起こしには、前のインテントとそのインテントの推定スロット値が含まれています。
一番上の文字起こしの推測されたインテントが前のターンと異なり、1 つの文字起こしが前のインテントと同じで、1 つの文字起こしが別のインテントである場合、
- 一番上の文字起こしには、発話に含まれる新しい推測されたインテントと推測されたスロット値が含まれます。
- 以前に推測されたインテントを含む文字起こしには、そのインテントの推測されたスロット値が含まれています。
- インテントが異なる文字起こしには、推測されたインテント名も推測されたスロット値もありません。
一番上の文字起こしの推測されたインテントが前のターンと異なり、他のすべての文字起こしのインテントが異なる場合、
- 一番上の文字起こしには、発話に含まれる新しい推測されたインテントと推測されたスロット値が含まれます。
- その他の文字起こしには、推測されたインテントや推測されたスロット値が含まれていません。
上位 2 つの文字起こしの推測されたインテントが前のターンと同じで異なっていて、3 番目の文字起こしのインテントが異なる場合、
- 上位 2 つの文字起こしには、発話に含まれる新しい推測されたインテントと推測されたスロット値が含まれます。
- 3 番目の文字起こしにはインテント名も解決済みのスロット値もありません。

セッション管理

Amazon Lex V2 がユーザーとの会話で使用するインテントを変更するには、ダイアログコードフックの Lambda 関数からの応答を使用します。または、カスタムアプリケーションでセッション管理 API を使用することもできます。

Lambda 関数を使用する

Lambda 関数を使用する場合、Amazon Lex V2 は関数への入力を含む JSON 構造で呼び出します。JSON 構造は、Amazon Lex V2 が発話として特定した可能な文字変換を含む transcriptions と呼ばれるフィールドを含んでいます。transcriptions フィールドには、1～3 つの可能な文字起こしが含まれており、それぞれに信頼スコアが付けられています。

別の文字起こしからのインテントを使用するには、Lambda 関数の ConfirmIntent または ElicitSlot ダイアログアクションで指定します。代替文字起こしのスロット値を使用するには、Lambda 関数レスポンスの intent フィールドに値を設定します。詳細については、「AWS Lambda 関数によるカスタムロジックの有効化」を参照してください。

サンプルのコード

次のコード例は Python Lambda 関数で、音声文字変換を使用してユーザーの会話エクスペリエンスを向上させます。

サンプルコードを使用するには、以下が必要です。

英語 (GB) (en_GB) または英語 (GB) (en_US) のうち 1 つの言語を使ったボット。
1 つのインテント、OrderBirthStone。インテント定義の [コードフック] セクションで [初期化と検証に Lambda 関数を使用する] が選択されていることを確認してください。
インテントには「BirthMonth」と「Name」の 2 つのスロットが必要であり、どちらも AMAZON.AlphaNumeric タイプとなります。
Lambda 関数が定義されたエイリアス。詳細については、「Lambda 関数を作成して、ボットエイリアスにアタッチする」を参照してください。


import time
import os
import logging

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)


# --- Helpers that build all of the responses ---

def elicit_slot(session_attributes, intent_request, slots, slot_to_elicit, message):
    return {
        'sessionState': {
            'dialogAction': {
                'type': 'ElicitSlot',
                'slotToElicit': slot_to_elicit
            },
            'intent': {
                'name': intent_request['sessionState']['intent']['name'],
                'slots': slots,
                'state': 'InProgress'
            },
            'sessionAttributes': session_attributes,
            'originatingRequestId': 'e3ab4d42-fb5f-4cc3-bb78-caaf6fc7cccd'
        },
        'sessionId': intent_request['sessionId'],
        'messages': [message],
        'requestAttributes': intent_request['requestAttributes'] if 'requestAttributes' in intent_request else None
    }


def close(intent_request, session_attributes, fulfillment_state, message):
    intent_request['sessionState']['intent']['state'] = fulfillment_state
    return {
        'sessionState': {
            'sessionAttributes': session_attributes,
            'dialogAction': {
                'type': 'Close'
            },
            'intent': intent_request['sessionState']['intent'],
            'originatingRequestId': '3ab4d42-fb5f-4cc3-bb78-caaf6fc7cccd'
        },
        'messages': [message],
        'sessionId': intent_request['sessionId'],
        'requestAttributes': intent_request['requestAttributes'] if 'requestAttributes' in intent_request else None
    }


def delegate(intent_request, session_attributes):
    return {
        'sessionState': {
            'dialogAction': {
                'type': 'Delegate'
            },
            'intent': intent_request['sessionState']['intent'],
            'sessionAttributes': session_attributes,
            'originatingRequestId': 'abc'
        },
        'sessionId': intent_request['sessionId'],
        'requestAttributes': intent_request['requestAttributes'] if 'requestAttributes' in intent_request else None
    }


def get_session_attributes(intent_request):
    sessionState = intent_request['sessionState']
    if 'sessionAttributes' in sessionState:
        return sessionState['sessionAttributes']

    return {}


def get_slots(intent_request):
    return intent_request['sessionState']['intent']['slots']


""" --- Functions that control the behavior of the bot --- """


def order_birth_stone(intent_request):
    """
    Performs dialog management and fulfillment for ordering a birth stone.
    Beyond fulfillment, the implementation for this intent demonstrates the following:
    1) Use of N best transcriptions to re prompt user when confidence for top transcript is below a threshold
    2) Overrides resolved slot for birth month from a known fixed list if the top transcript
    is not accurate.
    """

    transcriptions = intent_request['transcriptions']

    if intent_request['invocationSource'] == 'DialogCodeHook':
        # Disambiguate if there are multiple transcriptions and the top transcription
        # confidence is below a threshold (0.8 here)
        if len(transcriptions) > 1 and transcriptions[0]['transcriptionConfidence'] < 0.8:
            if transcriptions[0]['resolvedSlots'] is not {} and 'Name' in transcriptions[0]['resolvedSlots'] and \
                    transcriptions[0]['resolvedSlots']['Name'] is not None:
                return prompt_for_name(intent_request)
            elif transcriptions[0]['resolvedSlots'] is not {} and 'BirthMonth' in transcriptions[0]['resolvedSlots'] and \
                    transcriptions[0]['resolvedSlots']['BirthMonth'] is not None:
                return validate_month(intent_request)

    return continue_conversation(intent_request)


def prompt_for_name(intent_request):
    """
    If the confidence for the name is not high enough, re prompt the user with the recognized names
    so it can be confirmed.
    """
    resolved_names = []
    for transcription in intent_request['transcriptions']:
        if transcription['resolvedSlots'] is not {} and 'Name' in transcription['resolvedSlots'] and \
                transcription['resolvedSlots']['Name'] is not None:
            resolved_names.append(transcription['resolvedSlots']['Name']['value']['originalValue'])
    if len(resolved_names) > 1:
        session_attributes = get_session_attributes(intent_request)
        slots = get_slots(intent_request)
        return elicit_slot(session_attributes, intent_request, slots, 'Name',
                           {'contentType': 'PlainText',
                            'content': 'Sorry, did you say your name is {} ?'.format(" or ".join(resolved_names))})
    else:
        return continue_conversation(intent_request)


def validate_month(intent_request):
    """
    Validate month from an expected list, if not valid looks for other transcriptions and to see if the month
    recognized there has an expected value. If there is, replace with that and if not continue conversation.
    """

    expected_months = ['january', 'february', 'march']
    resolved_months = []
    for transcription in intent_request['transcriptions']:
        if transcription['resolvedSlots'] is not {} and 'BirthMonth' in transcription['resolvedSlots'] and \
                transcription['resolvedSlots']['BirthMonth'] is not None:
            resolved_months.append(transcription['resolvedSlots']['BirthMonth']['value']['originalValue'])

    for resolved_month in resolved_months:
        if resolved_month in expected_months:
            intent_request['sessionState']['intent']['slots']['BirthMonth']['resolvedValues'] = [resolved_month]
            break

    return continue_conversation(intent_request)


def continue_conversation(event):
    session_attributes = get_session_attributes(event)

    if event["invocationSource"] == "DialogCodeHook":
        return delegate(event, session_attributes)


# --- Intents ---


def dispatch(intent_request):
    """
    Called when the user specifies an intent for this bot.
    """

    logger.debug('dispatch sessionId={}, intentName={}'.format(intent_request['sessionId'],
                                                               intent_request['sessionState']['intent']['name']))

    intent_name = intent_request['sessionState']['intent']['name']

    # Dispatch to your bot's intent handlers
    if intent_name == 'OrderBirthStone':
        return order_birth_stone(intent_request)

    raise Exception('Intent with name ' + intent_name + ' not supported')


# --- Main handler ---


def lambda_handler(event, context):
    """
    Route the incoming request based on intent.
    The JSON body of the request is provided in the event slot.

    """
    # By default, treat the user request as coming from the America/New_York time zone.
    os.environ['TZ'] = 'America/New_York'
    time.tzset()
    logger.debug('event={}'.format(event))

    return dispatch(event)

セッション管理 API を使用する

現在のインテントと異なるインテントを使用するには、PutSession オペレーションを使用します。例えば、Amazon Lex V2 が選択したインテントよりも、最初の選択肢が望ましいと判断した場合、PutSession オペレーションを使用してインテントを変更することができます。そうすれば、ユーザーが次に操作するインテントが、選択済みのインテントになります。

また、PutSession オペレーションを使用して intent 構造内のスロット値を変更し、代替トランスクリプションの値を使用するようにすることもできます。

詳細については、「Amazon Lex V2 API によるセッションの管理をする　」を参照してください。

ブラウザで JavaScript が無効になっているか、使用できません。

AWS ドキュメントを使用するには、JavaScript を有効にする必要があります。手順については、使用するブラウザのヘルプページを参照してください。

ドキュメントの表記規則

意図的な信頼スコアを使用する

音声文字起こしのカスタマイズ