BatchPutDocument API を使用したドキュメントの追加 S3 バケットからのドキュメントの追加

バッチアップロードを使用したドキュメントのインデックスへの直接追加

BatchPutDocument API を使用して、ドキュメントをインデックスに直接追加できます。コンソールを使用してドキュメントを直接追加することはできません。コンソールを使用する場合、データソースに接続して、ドキュメントをインデックスに追加します。ドキュメントは S3 バケットから追加することも、バイナリデータとして指定することもできます。でサポートされているドキュメントタイプのリストについては、「ドキュメントのタイプ Amazon Kendra 」を参照してください。

BatchPutDocument を使用したインデックスへのドキュメントの追加は、非同期演算です。BatchPutDocument API を呼び出した後、BatchGetDocumentStatus API を使用して、ドキュメントのインデックス作成の進行状況をモニタリングします。ドキュメント ID のリストで BatchGetDocumentStatus API を呼び出すと、ドキュメントのステータスが返されます。ドキュメントのステータスが INDEXED または FAILED の場合、ドキュメントの処理は完了しています。ステータスが FAILED の場合、BatchGetDocumentStatus API は、ドキュメントにインデックス作成できなかった理由を返します。

ドキュメント取り込みプロセス中にコンテンツやドキュメントメタデータのフィールドや属性を変更する場合は、「Amazon Kendra Custom Document Enrichment」を参照してください。カスタムデータソースを使用する場合、BatchPutDocument API を使用して送信する各ドキュメントには、属性またはフィールドとしてデータソース ID と実行 ID が必要です。詳細については、「Required attributes for custom data sources」を参照してください。

注記

各ドキュメント ID はインデックスごとに一意である必要があります。一意の ID でドキュメントにインデックスを付けるデータソースを作成してから、BatchPutDocument API を使用して同じドキュメントにインデックスを付けることはできません。その逆も同様です。データソースを削除してから BatchPutDocument API を使用して同じドキュメントにインデックスを付けることができます。その逆も可能です。BatchPutDocument と BatchDeleteDocument API を同じドキュメントセットの Amazon Kendra データソースコネクタと組み合わせて使用すると、データに不整合が生じる可能性があります。代わりに、Amazon Kendra カスタムデータソースコネクタの使用をお勧めします。

次のデベロッパーガイドドキュメントでは、ドキュメントをインデックスに直接追加する方法を示します。

BatchPutDocument API を使用したドキュメントの追加

次の例では、BatchPutDocument を呼び出して、テキストの blob をインデックスに追加します。BatchPutDocument API を使用して、インデックスに直接ドキュメントを追加できます。でサポートされているドキュメントタイプのリストについては、「ドキュメントのタイプ Amazon Kendra 」を参照してください。

AWS CLI および SDKs」を参照してください。 https://docs.aws.amazon.com/kendra/latest/dg/create-index.htmlCLI と SDK をセットアップするには、「Setting up Amazon Kendra」を参照してください。

注記

インデックスに追加されるファイルは、UTF-8 でエンコードされたバイトストリームに存在する必要があります。

次の例では、UTF-8 でエンコードされたテキストをインデックスに追加します。

CLI

で AWS Command Line Interface、次のコマンドを使用します。次のコマンドは、Linux と macOS 用にフォーマットされています。Windows を使用している場合、Unix 行連結記号 (\) をキャレット (^) に置き換えます。


aws kendra batch-put-document \
   --index-id index-id \
   --documents '{"Id":"doc-id-1", "Blob":"Amazon.com is an online retailer.", "ContentType":"PLAIN_TEXT", "Title":"Information about Amazon.com"}'

Python


import boto3

kendra = boto3.client("kendra")

# Provide the index ID
index_id = "index-id"

# Provide the title and text
title = "Information about Amazon.com"
text = "Amazon.com is an online retailer."

document = {
    "Id": "1",
    "Blob": text,
    "ContentType": "PLAIN_TEXT",
    "Title": title
}

documents = [
    document
]

result = kendra.batch_put_document(
    IndexId = index_id,
    Documents = documents
)

print(result)

Java


package com.amazonaws.kendra;


import software.amazon.awssdk.core.SdkBytes;
import software.amazon.awssdk.services.kendra.KendraClient;
import software.amazon.awssdk.services.kendra.model.BatchPutDocumentRequest;
import software.amazon.awssdk.services.kendra.model.BatchPutDocumentResponse;
import software.amazon.awssdk.services.kendra.model.ContentType;
import software.amazon.awssdk.services.kendra.model.Document;

public class AddDocumentsViaAPIExample {
    public static void main(String[] args) {
        KendraClient kendra = KendraClient.builder().build();

        String indexId = "yourIndexId";

        Document testDoc = Document
            .builder()
            .title("The title of your document")
            .id("a_doc_id")
            .blob(SdkBytes.fromUtf8String("your text content"))
            .contentType(ContentType.PLAIN_TEXT)
            .build();

        BatchPutDocumentRequest batchPutDocumentRequest = BatchPutDocumentRequest
            .builder()
            .indexId(indexId)
            .documents(testDoc)
            .build();

        BatchPutDocumentResponse result = kendra.batchPutDocument(batchPutDocumentRequest);

        System.out.println(String.format("BatchPutDocument Result: %s", result));
    }
}

S3 バケットからのドキュメントの追加

BatchPutDocument API を使用して、 Amazon S3 バケットからインデックスに直接ドキュメントを追加できます。同じコールで最大 10 個のドキュメントを追加できます。S3 バケットを使用する場合は、ドキュメントを含むバケットにアクセスするためのアクセス許可を IAM ロールに付与する必要があります。RoleArn パラメータでロールを指定します。

BatchPutDocument API を使用して Amazon S3 バケットからドキュメントを追加する操作は 1 回限りです。インデックスをバケットのコンテンツと同期させるには、 Amazon S3 データソースを作成します。詳細については、「Amazon S3 data source」を参照してください。

AWS CLI および SDKs」を参照してください。 https://docs.aws.amazon.com/kendra/latest/dg/create-index.htmlCLI と SDK をセットアップするには、「Setting up Amazon Kendra」を参照してください。S3 バケットの作成については、Amazon Simple Storage Service ドキュメントを参照してください。

次の使用例は、BatchPutDocument API を使用して、インデックスに 2 つの Microsoft Word ドキュメントを追加します。

Python


import boto3

kendra = boto3.client("kendra")

# Provide the index ID
index_id = "index-id"
# Provide the IAM role ARN required to index documents in an S3 bucket
role_arn = "arn:aws:iam::${acccountID}:policy/${roleName}"

doc1_s3_file_data = {
    "Bucket": "bucket-name",
    "Key": "document1.docx"
}

doc1_document = {
    "S3Path": doc1_s3_file_data,
    "Title": "Document 1 title",
    "Id": "doc_1"
}

doc2_s3_file_data = {
    "Bucket": "bucket-name",
    "Key": "document2.docx"
}

doc2_document = {
    "S3Path": doc2_s3_file_data,
    "Title": "Document 2 title",
    "Id": "doc_2"
}

documents = [
    doc1_document,
    doc2_document
]

result = kendra.batch_put_document(
    Documents = documents,
    IndexId = index_id,
    RoleArn = role_arn
)

print(result)

Java


package com.amazonaws.kendra;

import software.amazon.awssdk.services.kendra.KendraClient;
import software.amazon.awssdk.services.kendra.model.BatchPutDocumentRequest;
import software.amazon.awssdk.services.kendra.model.BatchPutDocumentResponse;
import software.amazon.awssdk.services.kendra.model.Document;
import software.amazon.awssdk.services.kendra.model.S3Path;

public class AddFilesFromS3Example {
    public static void main(String[] args) {
        KendraClient kendra = KendraClient.builder().build();

        String indexId = "yourIndexId";
        String roleArn = "yourIndexRoleArn";

        Document pollyDoc = Document
            .builder()
            .s3Path(
                S3Path.builder()
                .bucket("amzn-s3-demo-bucket")
                .key("What is Amazon Polly.docx")
                .build())
            .title("What is Amazon Polly")
            .id("polly_doc_1")
            .build();

        Document rekognitionDoc = Document
            .builder()
            .s3Path(
                S3Path.builder()
                .bucket("amzn-s3-demo-bucket")
                .key("What is Amazon Rekognition.docx")
                .build())
            .title("What is Amazon rekognition")
            .id("rekognition_doc_1")
            .build();

        BatchPutDocumentRequest batchPutDocumentRequest = BatchPutDocumentRequest
            .builder()
            .indexId(indexId)
            .roleArn(roleArn)
            .documents(pollyDoc, rekognitionDoc)
            .build();

        BatchPutDocumentResponse result = kendra.batchPutDocument(batchPutDocumentRequest);

        System.out.println(String.format("BatchPutDocument result: %s", result));
    }
}

ブラウザで JavaScript が無効になっているか、使用できません。

AWS ドキュメントを使用するには、JavaScript を有効にする必要があります。手順については、使用するブラウザのヘルプページを参照してください。

ドキュメントの表記規則

インデックスの作成

よくある質問 (FAQ) のインデックスへの追加