将同义词库添加到索引中 - Amazon Kendra

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

将同义词库添加到索引中

以下过程说明如何将包含同义词的同义词库文件添加到索引中。要了解更新后的同义词库文件的效果,最多可能需要 30 分钟。有关同义词库文件的更多信息,请参阅创建同义词库文件

Console
添加同义词库
  1. 在左侧导航窗格中要添加同义词列表的索引下,请选择同义词

  2. 同义词页面上,选择添加同义词库

  3. 定义同义词库中,为同义词库指定名称和可选描述。

  4. 同义词库设置中,提供同义词库文件的 Amazon S3 路径。文件必须小于 5 MB。

  5. 对于 IAM 角色,选择一个角色或选择创建新角色并指定角色名称以创建新角色。 Amazon Kendra 使用此角色代表您访问 Amazon S3 资源。IAM 角色的前缀为 “AmazonKendra-”。

  6. 选择保存以保存配置并添加同义词库。收录同义词库后,它就会处于活动状态,结果中会突出显示同义词。要了解同义词库文件的效果,最多可能需要 30 分钟。

CLI

要使用将同义词添加到索引中 AWS CLI,请调用:create-thesaurus

aws kendra create-thesaurus \ --index-id index-id \ --name "thesaurus-name" \ --description "thesaurus-description" \ --source-s3-path "Bucket=bucket-name,Key=thesaurus/synonyms.txt" \ --role-arn role-arn

调用 list-thesauri 以查看同义词库列表:

aws kendra list-thesauri \ --index-id index-id

要查看同义词库的详细信息,请调用 describe-thesaurus

aws kendra describe-thesaurus \ --index-id index-id \ --index-id thesaurus-id

要了解同义词库文件的效果,最多可能需要 30 分钟。

Python
import boto3 from botocore.exceptions import ClientError import pprint import time kendra = boto3.client("kendra") print("Create a thesaurus") thesaurus_name = "thesaurus-name" thesaurus_description = "thesaurus-description" thesaurus_role_arn = "role-arn" index_id = "index-id" s3_bucket_name = "bucket-name" s3_key = "thesaurus-file" source_s3_path= { 'Bucket': s3_bucket_name, 'Key': s3_key } try: thesaurus_response = kendra.create_thesaurus( Description = thesaurus_description, Name = thesaurus_name, RoleArn = thesaurus_role_arn, IndexId = index_id, SourceS3Path = source_s3_path ) pprint.pprint(thesaurus_response) thesaurus_id = thesaurus_response["Id"] print("Wait for Kendra to create the thesaurus.") while True: # Get thesaurus description thesaurus_description = kendra.describe_thesaurus( Id = thesaurus_id, IndexId = index_id ) # If status is not CREATING quit status = thesaurus_description["Status"] print("Creating thesaurus. Status: " + status) if status != "CREATING": break time.sleep(60) except ClientError as e: print("%s" % e) print("Program ends.")
Java
package com.amazonaws.kendra; import software.amazon.awssdk.services.kendra.KendraClient; import software.amazon.awssdk.services.kendra.model.CreateThesaurusRequest; import software.amazon.awssdk.services.kendra.model.CreateThesaurusResponse; import software.amazon.awssdk.services.kendra.model.DescribeThesaurusRequest; import software.amazon.awssdk.services.kendra.model.DescribeThesaurusResponse; import software.amazon.awssdk.services.kendra.model.S3Path; import software.amazon.awssdk.services.kendra.model.ThesaurusStatus; public class CreateThesaurusExample { public static void main(String[] args) throws InterruptedException { KendraClient kendra = KendraClient.builder().build(); String thesaurusName = "thesaurus-name"; String thesaurusDescription = "thesaurus-description"; String thesaurusRoleArn = "role-arn"; String s3BucketName = "bucket-name"; String s3Key = "thesaurus-file"; String indexId = "index-id"; System.out.println(String.format("Creating a thesaurus named %s", thesaurusName)); CreateThesaurusRequest createThesaurusRequest = CreateThesaurusRequest .builder() .name(thesaurusName) .indexId(indexId) .description(thesaurusDescription) .roleArn(thesaurusRoleArn) .sourceS3Path(S3Path.builder() .bucket(s3BucketName) .key(s3Key) .build()) .build(); CreateThesaurusResponse createThesaurusResponse = kendra.createThesaurus(createThesaurusRequest); System.out.println(String.format("Thesaurus response %s", createThesaurusResponse)); String thesaurusId = createThesaurusResponse.id(); System.out.println(String.format("Waiting until the thesaurus with ID %s is created.", thesaurusId)); while (true) { DescribeThesaurusRequest describeThesaurusRequest = DescribeThesaurusRequest.builder() .id(thesaurusId) .indexId(indexId) .build(); DescribeThesaurusResponse describeThesaurusResponse = kendra.describeThesaurus(describeThesaurusRequest); ThesaurusStatus status = describeThesaurusResponse.status(); if (status != ThesaurusStatus.CREATING) { break; } TimeUnit.SECONDS.sleep(60); } System.out.println("Thesaurus creation is complete."); } }