スターターのインテント検出テンプレートインテント検出のカスタムテンプレート前注釈の Lambda 関数後注釈 Lambda 関数ラベル付けジョブの出力

デモテンプレート: `crowd-classifier` を使用したラベル付けインテント

カスタムテンプレートを選択したら、[Custom labeling task] (カスタムラベル付けタスク) パネルが表示されます。ここで、一般的なタスクを表す、複数のスターターテンプレートから選択できます。テンプレートで作業する開始点を提供し、カスタマイズしたラベル付けタスクのテンプレートを構築します。

このデモンストレーションでは、crowd-classifier 要素を使用する Intent Detection テンプレートと、タスク前後のデータ処理に必要な AWS Lambda 関数に取り組みます。

スターターのインテント検出テンプレート

出発点として提供されているインテント検出テンプレートです。


<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>

<crowd-form>
  <crowd-classifier
    name="intent"
    categories="{{ task.input.labels | to_json | escape }}"
    header="Pick the most relevant intention expressed by the below text"
  >
    <classification-target>
      {{ task.input.utterance }}
    </classification-target>
    
    <full-instructions header="Intent Detection Instructions">
        <p>Select the most relevant intention expressed by the text.</p>
        <div>
           <p><strong>Example: </strong>I would like to return a pair of shoes</p>
           <p><strong>Intent: </strong>Return</p>
        </div>
    </full-instructions>

    <short-instructions>
      Pick the most relevant intention expressed by the text
    </short-instructions>
  </crowd-classifier>
</crowd-form>

カスタムテンプレートでは Liquid テンプレート言語を使用します。二重波括弧で囲まれたそれぞれの項目は 1 つの変数です。注釈前 AWS Lambda 関数は、という名前のオブジェクトを提供する必要がありtaskInput、そのオブジェクトのプロパティにはテンプレート{{ task.input.<property name> }}のとしてアクセスできます。

インテント検出のカスタムテンプレート

スターターテンプレートには、crowd-classifier 要素の開始タグ内の task.input.labels プロパティと、classification-target リージョンのコンテンツ内の task.input.utterance の 2 つの変数があります。

発話が異なるラベルのセットを別々に提供する必要がない限り、変数を避けてテキストを使用するだけで処理時間は短縮され、エラーが生じる可能性は低くなります。このデモンストレーションで使用されているテンプレートではその変数は削除されますが、to_json のような変数やフィルターの詳細については、「crowd-bounding-box のデモンストレーション」の記事を参照してください。

要素のスタイル

見過ごされることがあるこのようなカスタム要素には、<full-instructions> リージョンと <short-instructions> リージョンの 2 つの部分があります。適切な指示を行うと優れた結果が生成されます。

これらのリージョンが含まれる要素では、<short-instructions> は、ワーカーの画面の左側にある [Instructions] ペインに自動的に表示されます。<full-instructions> は、そのペインの上部近くにある [View full instruction] からリンクされています。リンクをクリックすると、モーダルペインが開き、詳細な手順が表示されます。

これらのセクションで HTML、CSS、および JavaScript のみを使用することできませんが、ワーカーが迅速かつ正確に作業を完了するのに役立つ強固な指示や例を提供できると確信する場合は推奨されます。

例 JSFiddle のサンプルを試す

<crowd-classifier> タスクのサンプルをお試しください。このサンプルは JSFiddle でレンダリングされているため、テンプレート変数はすべて、ハードコードされた値に置き換えられます。拡張 CSS スタイルを使用した一連の例を表示するには、[View full instructions] (完全な説明を表示) リンクをクリックします。プロジェクトをフォークして、CSS への独自の変更、サンプル画像の追加、または JavaScript 拡張機能の追加を試すことができます。

例 : カスタマイズされた最終的なインテント検出テンプレート

ここでは、<crowd-classifier> タスクのサンプルを使用しますが、<classification-target> の変数を使用します。一連の異なるラベル付けジョブの間で一貫した CSS デザインを維持する場合は、他の HTML ドキュメントと同じ方法で <link rel...> 要素を使用して外部スタイルシートを含めることができます。


<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>

<crowd-form>
  <crowd-classifier
    name="intent"
    categories="['buy', 'eat', 'watch', 'browse', 'leave']"
    header="Pick the most relevant intent expressed by the text below"
  >
    <classification-target>
      {{ task.input.source }}
    </classification-target>
    
    <full-instructions header="Emotion Classification Instructions">
      <p>In the statements and questions provided in this exercise, what category of action is the speaker interested in doing?</p>
          <table>
            <tr>
              <th>Example Utterance</th>
              <th>Good Choice</th>
            </tr>
            <tr>
              <td>When is the Seahawks game on?</td>
              <td>
                eat<br>
                <greenbg>watch</greenbg>
                <botchoice>browse</botchoice>
              </td>
            </tr>
            <tr>
              <th>Example Utterance</th>
              <th>Bad Choice</th>
            </tr>
            <tr>
              <td>When is the Seahawks game on?</td>
              <td>
                buy<br>
                <greenbg>eat</greenbg>
                <botchoice>watch</botchoice>
              </td>
            </tr>
          </table>
    </full-instructions>

    <short-instructions>
      What is the speaker expressing they would like to do next?
    </short-instructions>  
  </crowd-classifier>
</crowd-form>
<style>
  greenbg {
    background: #feee23;
    display: block;
  }

  table {
    *border-collapse: collapse; /* IE7 and lower */
    border-spacing: 0; 
  }

  th, tfoot, .fakehead {
    background-color: #8888ee;
    color: #f3f3f3;
    font-weight: 700;
  }

  th, td, tfoot {
      border: 1px solid blue;
  }

  th:first-child {
    border-radius: 6px 0 0 0;
  }

  th:last-child {
    border-radius: 0 6px 0 0;
  }

  th:only-child{
    border-radius: 6px 6px 0 0;
  }

  tfoot:first-child {
    border-radius: 0 0 6px 0;
  }

  tfoot:last-child {
    border-radius: 0 0 0 6px;
  }

  tfoot:only-child{
    border-radius: 6px 6px;
  }

  td {
    padding-left: 15px ;
    padding-right: 15px ;
  }

  botchoice {
    display: block;
    height: 17px;
    width: 490px;
    overflow: hidden;
    position: relative;
    background: #fff;
    padding-bottom: 20px;
  }

  botchoice:after {
    position: absolute;
    bottom: 0;
    left: 0;  
    height: 100%;
    width: 100%;
    content: "";
    background: linear-gradient(to top,
       rgba(255,255,255, 1) 55%, 
       rgba(255,255,255, 0) 100%
    );
    pointer-events: none; /* so the text is still selectable */
  }
</style>

例 : マニフェストファイル

このようなテキスト分類タスク向けにマニフェストファイルを手動で準備している場合は、データを次の方法でフォーマットします。


{"source": "Roses are red"}
{"source": "Violets are Blue"}
{"source": "Ground Truth is the best"}
{"source": "And so are you"}

この方法は、「デモテンプレート: crowd-bounding-box を使用したイメージの注釈」は source-ref ではなくプロパティ名として使用されたという点で、source デモンストレーション向けに使用されるマニフェストファイルとは異なります。source-ref を使用すると、HTTP に変換する必要のある画像または他のファイルの S3 URI が指定されます。それ以外の場合は、source を上記のテキスト文字列と同じように使用する必要があります。

前注釈の Lambda 関数

ジョブのセットアップの一環として、マニフェストエントリを処理してテンプレートエンジンに渡すために呼び出す AWS Lambda ことができるの ARN を指定します。

この Lambda 関数は、SageMaker、Sagemaker、sagemaker、または LabelingFunction の 4 つの文字列のいずれかを関数名の一部として含む必要があります。

これは、前注釈と後注釈の Lambda のいずれにも適用されます。

コンソールを使用しているときに、自分のアカウントが所有する Lambda がある場合は、命名要件に合う関数のドロップダウンリストが表示されて選択できます。

変数を 1 つだけ使用する、この非常に基本的な例では、主にパススルー関数です。Python 3.7 を使用した Lambda のサンプルプレラベル付けの例を次に示します。


import json

def lambda_handler(event, context):
    return {
        "taskInput":  event['dataObject']
    }

event の dataObject プロパティには、マニフェストのデータオブジェクトのプロパティが含まれます。

このデモでは、このプロパティはシンプルなパススルーであり、taskInput 値としてそのまま渡します。このような値を持つプロパティを event['dataObject'] オブジェクトに追加すると、{{ task.input.<property name> }} 形式の Liquid 変数として HTML テンプレートで利用できるようになります。

後注釈 Lambda 関数

ジョブのセットアップの一環として、ワーカーによるタスクの完了時にフォームデータを処理するために呼び出すことができる、Lambda 関数の ARN を指定します。これは、必要なだけシンプルにすることも複雑にすることもできます。データを取り込みながら統合とスコアに対応する場合は、選択したスコアアルゴリズムや統合アルゴリズムを適用できます。raw データを保存してオフライン処理する場合、これはオプションです。

後注釈の Lambda 関数のアクセス許可を設定する

注釈データは、payload オブジェクトの s3Uri 文字列で指定されたファイルにあります。注釈を取り込みながら処理するには、シンプルなパススルー関数の場合でも、注釈ファイルを読み取ることができるように、Lambda に対して S3ReadOnly アクセス権を割り当てる必要があります。

Lambda を作成するためのコンソールページで [Execution role] (実行ロール) パネルまでスクロールします。[Create a new role from one or more templates] (1 つ以上のテンプレートから新しいロールを作成します)を選択します。ロールに名前を付けます。[ポリシーテンプレート] ドロップダウンから [Amazon S3 object read-only permissions] (Amazon S3 オブジェクトの読み取り専用アクセス権限) を選択します。Lambda を保存すると、ロールが保存されて選択されます。

以下の例は、Python 3.7 を対象としています。


import json
import boto3
from urllib.parse import urlparse

def lambda_handler(event, context):
    consolidated_labels = []

    parsed_url = urlparse(event['payload']['s3Uri']);
    s3 = boto3.client('s3')
    textFile = s3.get_object(Bucket = parsed_url.netloc, Key = parsed_url.path[1:])
    filecont = textFile['Body'].read()
    annotations = json.loads(filecont);
    
    for dataset in annotations:
        for annotation in dataset['annotations']:
            new_annotation = json.loads(annotation['annotationData']['content'])
            label = {
                'datasetObjectId': dataset['datasetObjectId'],
                'consolidatedAnnotation' : {
                'content': {
                    event['labelAttributeName']: {
                        'workerId': annotation['workerId'],
                        'result': new_annotation,
                        'labeledContent': dataset['dataObject']
                        }
                    }
                }
            }
            consolidated_labels.append(label)

    return consolidated_labels

ラベル付けジョブの出力

後注釈 Lambda は、イベントオブジェクトでタスク結果のバッチを受信することがよくあります。このバッチは、Lambda が繰り返す必要のある payload オブジェクトになります。

ジョブの出力は、指定したターゲット S3 バケットの、ラベル付けジョブの名前から命名されたフォルダにあります。manifests というサブフォルダにあります。

インテント検出タスクの場合、出力マニフェストにある出力は、以下のデモのようになります。この例は、人間が読みやすくなるように整理されています。実際の出力は、機械で読み取れるように圧縮されています。

例 : 出力マニフェストの JSON


[
  {
    "datasetObjectId":"<Number representing item's place in the manifest>",
     "consolidatedAnnotation":
     {
       "content":
       {
         "<name of labeling job>":
         {     
           "workerId":"private.us-east-1.XXXXXXXXXXXXXXXXXXXXXX",
           "result":
           {
             "intent":
             {
                 "label":"<label chosen by worker>"
             }
           },
           "labeledContent":
           {
             "content":"<text content that was labeled>"
           }
         }
       }
     }
   },
  "datasetObjectId":"<Number representing item's place in the manifest>",
     "consolidatedAnnotation":
     {
       "content":
       {
         "<name of labeling job>":
         {     
           "workerId":"private.us-east-1.6UDLPKQZHYWJQSCA4MBJBB7FWE",
           "result":
           {
             "intent":
             {
                 "label": "<label chosen by worker>"
             }
           },
           "labeledContent":
           {
             "content": "<text content that was labeled>"
           }
         }
       }
     }
   },
     ...
     ...
     ...
]

これは、独自のカスタムテンプレートを作成し使用するうえで役立ちます。

ブラウザで JavaScript が無効になっているか、使用できません。

AWS ドキュメントを使用するには、JavaScript を有効にする必要があります。手順については、使用するブラウザのヘルプページを参照してください。

ドキュメントの表記規則

デモ: crowd-bounding-box を使用したイメージの注釈

API を使用してカスタムワークフローを作成する

デモテンプレート: crowd-classifier を使用したラベル付けインテント