先决条件获取 Amazon S3 存储桶中的图像的标签创建 Amazon DynamoDB 表将数据上传到 DynamoDB 在 Amazon RDS 中创建 MySQL 数据库将数据上传到 Amazon RDS MySQL 表

使用 Amazon RDS 和 DynamoDB 存储 Amazon Rekognition 数据

在使用 Amazon Rekognition 时 APIs，请务必记住，API 操作不会保存任何生成的标签。您可以通过将这些标签以及相应图像的标识符放入数据库来保存这些标签。

本教程演示如何检测标签并将检测到的标签保存到数据库中。本教程中开发的示例应用程序将从 Amazon S3 存储桶读取图像，对这些图像调用DetectLabels操作，并将生成的标签存储在数据库中。该应用程序将数据存储在 Amazon RDS 数据库实例或 DynamoDB 数据库中，具体取决于您要使用的数据库类型。

您将使用 AWS SDK for Python 或本教程。您还可以查看 AWS 文档 SDK 示例存储库，了解更多 GitHub P ython 教程。

先决条件

在开始本教程之前，你需要安装 Python 并完成设置 Python AWS 开发工具包所需的步骤。除此之外，请确保您：

已创建一个 AWS 账户和一个 IAM 角色

已安装 Python SDK (Boto3)

正确配置您的 AWS 访问凭证

已创建 Amazon S3 存储桶，里面填充了图像

如果使用 RDS 存储数据，则已创建 RDS 数据库实例

获取 Amazon S3 存储桶中的图像的标签

首先编写一个函数，该函数将获取您的 Amazon S3 存储桶中图像的名称并检索该图像。将显示此图像以确认是否将正确的图像传递到函数中也DetectLabels包含的调用中。

找到您要使用的 Amazon S3 存储桶，并记下其名称。您将调用此 Amazon S3 存储桶并读取其中的图像。确保您的存储桶包含一些要传递给DetectLabels操作的图像。

编写代码以连接到 Amazon S3 存储桶。您可以使用 Boto3 连接 Amazon S3 资源，从 Amazon S3 存储桶中检索图像。连接到 Amazon S3 资源后，您可以通过为 Amazon S3 存储桶名称提供存储桶方法来访问您的存储桶。连接到 Amazon S3 存储桶后，您可以使用对象方法从存储桶中检索图像。通过使用 Matplotlib，您可以使用此连接在图像处理时对其进行可视化。Boto3 还用于连接到 Rekognition 客户端。

在以下代码中，在 region_name 参数中提供您的区域。您需要将 Amazon S3 存储桶名称和图像名称传递给 DetectLabels，这将返回相应图像的标签。仅从响应中选择标签后，将返回图像名称和标签。


import boto3
from io import BytesIO
from matplotlib import pyplot as plt
from matplotlib import image as mp_img

boto3 = boto3.Session()

def read_image_from_s3(bucket_name, image_name):

    # Connect to the S3 resource with Boto 3
    # get bucket and find object matching image name
    s3 = boto3.resource('s3')
    bucket = s3.Bucket(name=bucket_name)
    Object = bucket.Object(image_name)

    # Downloading the image for display purposes, not necessary for detection of labels
    # You can comment this code out if you don't want to visualize the images
    file_name = Object.key
    file_stream = BytesIO()
    Object.download_fileobj(file_stream)
    img = mp_img.imread(file_stream, format="jpeg")
    plt.imshow(img)
    plt.show()

    # get the labels for the image by calling DetectLabels from Rekognition
    client = boto3.client('rekognition', region_name="region-name")
    response = client.detect_labels(Image={'S3Object': {'Bucket': bucket_name, 'Name': image_name}},
                                    MaxLabels=10)

    print('Detected labels for ' + image_name)

    full_labels = response['Labels']

    return file_name, full_labels

将此代码保存在名为 get_images.py 的文件中。

创建 Amazon DynamoDB 表

以下代码使用 Boto3 连接到 DynamoDB，并使用 DynamoDB CreateTable 方法创建名为 Images 的表。该表具有一个由分区键 (Image) 和排序键 (Labels) 组成的复合主键。Image 键包含图像的名称，而 Labels 键存储分配给该图像的标签。


import boto3

def create_new_table(dynamodb=None):
    dynamodb = boto3.resource(
        'dynamodb',)
    # Table defination
    table = dynamodb.create_table(
        TableName='Images',
        KeySchema=[
            {
                'AttributeName': 'Image',
                'KeyType': 'HASH'  # Partition key
            },
            {
                'AttributeName': 'Labels',
                'KeyType': 'RANGE'  # Sort key
            }
        ],
        AttributeDefinitions=[
            {
                'AttributeName': 'Image',
                'AttributeType': 'S'
            },
            {
                'AttributeName': 'Labels',
                'AttributeType': 'S'
            },
        ],
        ProvisionedThroughput={
            'ReadCapacityUnits': 10,
            'WriteCapacityUnits': 10
        }
    )
    return table

if __name__ == '__main__':
    device_table = create_new_table()
    print("Status:", device_table.table_status)

将此代码保存到编辑器中，运行一次即可创建 DynamoDB 表。

将数据上传到 DynamoDB

现在 DynamoDB 数据库已经创建完毕，并且您有了获取图像标签的函数，您可以将标签存储在 DynamoDB 中。以下代码检索 S3 存储桶中的所有图像，为它们获取标签，然后将数据存储在 DynamoDB 中。

您需要编写用于将数据上传到 DynamoDB 的代码。名为get_image_names的函数用于连接您的 Amazon S3 存储桶，它以列表形式返回存储桶中所有图像的名称。您需要将此列表传递到read_image_from_S3 函数中，该函数是从您创建的get_images.py 文件中导入的。


import boto3
import json
from get_images import read_image_from_s3

boto3 = boto3.Session()

def get_image_names(name_of_bucket):

    s3_resource = boto3.resource('s3')
    my_bucket = s3_resource.Bucket(name_of_bucket)
    file_list = []
    for file in my_bucket.objects.all():
        file_list.append(file.key)
    return file_list

我们之前创建的read_image_from_S3 函数将返回正在处理的图像的名称以及与该图像关联的标签字典。名为find_values的函数用于从响应中仅获取标签。然后，可以将图像的名称及其标签上传到您的 DynamoDB 表中。


def find_values(id, json_repr):
    results = []

    def _decode_dict(a_dict):
        try:
            results.append(a_dict[id])
        except KeyError:
            pass
        return a_dict

    json.loads(json_repr, object_hook=_decode_dict) # Return value ignored.
    return results

您将使用名为load_data的第三个函数将图像和标签实际加载到您创建的 DynamoDB 表中。


def load_data(image_labels, dynamodb=None):

    if not dynamodb:
        dynamodb = boto3.resource('dynamodb')

    table = dynamodb.Table('Images')

    print("Adding image details:", image_labels)
    table.put_item(Item=image_labels)
    print("Success!!")

这里是调用我们之前定义的三个函数以及执行操作的地方。将上面定义的三个函数以及下面的代码添加到 Python 文件中。运行该代码。


bucket = "bucket_name"
file_list = get_image_names(bucket)

for file in file_list:
    file_name = file
    print("Getting labels for " + file_name)
    image_name, image_labels = read_image_from_s3(bucket, file_name)
    image_json_string = json.dumps(image_labels, indent=4)
    labels=set(find_values("Name", image_json_string))
    print("Labels found: " + str(labels))
    labels_dict = {}
    print("Saving label data to database")
    labels_dict["Image"] = str(image_name)
    labels_dict["Labels"] = str(labels)
    print(labels_dict)
    load_data(labels_dict)
    print("Success!")

您刚刚为图像生成标签并将这些标签存储在 DynamoDB 实例中。DetectLabels在学习本教程时，请务必删除您创建的所有资源。这样可以防止您因未使用的资源而被收取费用。

在 Amazon RDS 中创建 MySQL 数据库

在继续操作之前，请确保您已经完成了 Amazon RDS 的设置过程并使用 Amazon RDS 创建了一个 MySQL 数据库实例。

以下代码使用了 PyMySQL 库和您的 Amazon RDS 数据库实例。它会创建一个表来保存您的图像名称以及与这些图像关联的标签。Amazon RDS 会收到用于创建表和向表中插入数据的命令。要使用 Amazon RDS，您必须使用您的主机名、用户名和密码连接到 Amazon RDS 主机。通过向 PyMy SQL 的connect函数提供这些参数并创建游标实例，您将连接到 Amazon RDS。

在以下代码中，将主机的值替换为您的 Amazon RDS 主机端点，并将用户的值替换为与您的 Amazon RDS 实例关联的主用户名。您还需要将密码替换为主用户的主密码。
```
import pymysql

host = "host-endpoint"
user = "username"
password = "master-password"
```

创建数据库和表格，将图像和标签数据插入其中。通过运行并提交创建查询来执行此操作。以下代码创建一个数据库。仅运行此代码一次。


conn = pymysql.connect(host=host, user=user, passwd=password)
print(conn)
cursor = conn.cursor()
print("Connection successful")

# run once
create_query = "create database rekogDB1"
print("Creation successful!")
cursor.execute(create_query)
cursor.connection.commit()

数据库创建完成后，必须创建一个表来插入图像名称和标签。要创建表，首先要将 use SQL 命令以及数据库名称传递给execute函数。建立连接后，将运行创建表的查询。以下代码连接到数据库，然后创建一个表，其中包含一个名为image_id 的主键和一个存储标签的文本属性。使用您之前定义的导入和变量，然后运行此代码在数据库中创建表。
```
# connect to existing DB
cursor.execute("use rekogDB1")
cursor.execute("CREATE TABLE IF NOT EXISTS test_table(image_id VARCHAR (255) PRIMARY KEY, image_labels TEXT)")
conn.commit()
print("Table creation - Successful creation!")
```

将数据上传到 Amazon RDS MySQL 表

创建 Amazon RDS 数据库并在数据库中创建表后，您可以获取图像的标签并将这些标签存储在 Amazon RDS 数据库中。

连接到您的 Amazon S3 存储桶并检索存储桶中所有图像的名称。这些图像名称将传递到您之前创建的 read_image_from_s3 函数中，以获取所有图像的标签。以下代码连接到您的 Amazon S3 存储桶，并返回存储桶中所有图像的列表。


import pymysql
from get_images import read_image_from_s3
import json
import boto3

host = "host-endpoint"
user = "username"
password = "master-password"

conn = pymysql.connect(host=host, user=user, passwd=password)
print(conn)
cursor = conn.cursor()
print("Connection successful")

def get_image_names(name_of_bucket):

    s3_resource = boto3.resource('s3')
    my_bucket = s3_resource.Bucket(name_of_bucket)
    file_list = []
    for file in my_bucket.objects.all():
        file_list.append(file.key)
    return file_list

来自 DetectLabelsAPI 的响应不仅仅包含标签，因此请编写一个函数来仅提取标签值。以下函数返回一个仅包含标签的列表。


def find_values(id, json_repr):
    results = []

    def _decode_dict(a_dict):
        try:
            results.append(a_dict[id])
        except KeyError:
            pass
        return a_dict

    json.loads(json_repr, object_hook=_decode_dict) # Return value ignored.
    return results

您需要一个函数来将图像名称和标签插入表中。以下函数运行插入查询并插入任意一对给定的图像名称和标签。


def upload_data(image_id, image_labels):

    # insert into db
    cursor.execute("use rekogDB1")
    query = "INSERT IGNORE INTO test_table(image_id, image_labels) VALUES (%s, %s)"
    values = (image_id, image_labels)
    cursor.execute(query, values)
    conn.commit()
    print("Insert successful!")

最后，您必须运行上面定义的函数。在以下代码中，将收集存储桶中所有图像的名称并提供给调用的函数DetectLabels。之后，标签及其适用的图像名称将上传到您的 Amazon RDS 数据库。将上面定义的三个函数以及下面的代码复制到 Python 文件中。运行 Python 文件。


bucket = "bucket-name"
file_list = get_image_names(bucket)

for file in file_list:
    file_name = file
    print("Getting labels for " + file_name)
    image_name, image_labels = read_image_from_s3(bucket, file_name)
    image_json = json.dumps(image_labels, indent=4)
    labels=set(find_values("Name", image_json))
    print("Labels found: " + str(labels))
    unique_labels=set(find_values("Name", image_json))
    print(unique_labels)
    image_name_string = str(image_name)
    labels_string = str(unique_labels)
    upload_data(image_name_string, labels_string)
    print("Success!")

您已成功使用 DetectLabels 为图像生成标签，并使用 Amazon RDS 将这些标签存储在 MySQL 数据库中。在学习本教程时，请务必删除您创建的所有资源。这样可以防止您因未使用的资源而被收取费用。

有关更多 AWS 多服务示例，请参阅 AWS 文档 SDK 示例GitHub 存储库。

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

文档惯例

教程

使用 Amazon Rekognition 和 Lambda 标记 Amazon S3 存储桶中的资产