Mengubah dataset COCO

Gunakan contoh Python berikut untuk mengubah informasi kotak pembatas dari kumpulan data format COCO menjadi file manifes Label Kustom Rekognition Amazon. Kode mengunggah file manifes yang dibuat ke bucket Amazon S3 Anda. Kode ini juga menyediakan perintah AWS CLI yang dapat Anda gunakan untuk mengunggah gambar Anda.

Untuk mengubah dataset COCO (SDK)

Jika belum:
1. Pastikan Anda memiliki AmazonS3FullAccess izin. Untuk informasi selengkapnya, lihat Siapkan izin SDK.
2. Instal dan konfigurasikan AWS CLI dan AWS SDKs. Untuk informasi selengkapnya, lihat Langkah 4: Mengatur AWS CLI dan AWS SDKs.

Gunakan kode Python berikut untuk mengubah dataset COCO. Tetapkan nilai-nilai berikut.

s3_bucket— Nama bucket S3 tempat Anda ingin menyimpan gambar dan file manifes Label Kustom Rekognition Amazon.
s3_key_path_images— Jalur ke tempat Anda ingin menempatkan gambar di dalam ember S3 (s3_bucket).
s3_key_path_manifest_file— Jalur ke tempat Anda ingin menempatkan file manifes Label Kustom di dalam bucket S3 (s3_bucket).
local_path— Jalur lokal ke tempat contoh membuka kumpulan data COCO input dan juga menyimpan file manifes Label Kustom baru.
local_images_path— Jalur lokal ke gambar yang ingin Anda gunakan untuk pelatihan.
coco_manifest— Nama file dataset COCO masukan.
cl_manifest_file— Nama untuk file manifes yang dibuat oleh contoh. File disimpan di lokasi yang ditentukan olehlocal_path. Dengan konvensi, file memiliki ekstensi.manifest, tetapi ini tidak diperlukan.
job_name— Nama untuk pekerjaan Label Kustom.


import json
import os
import random
import shutil
import datetime
import botocore
import boto3
import PIL.Image as Image
import io

#S3 location for images
s3_bucket = 'bucket'
s3_key_path_manifest_file = 'path to custom labels manifest file/'
s3_key_path_images = 'path to images/'
s3_path='s3://' + s3_bucket  + '/' + s3_key_path_images
s3 = boto3.resource('s3')

#Local file information
local_path='path to input COCO dataset and output Custom Labels manifest/'
local_images_path='path to COCO images/'
coco_manifest = 'COCO dataset JSON file name'
coco_json_file = local_path + coco_manifest
job_name='Custom Labels job name'
cl_manifest_file = 'custom_labels.manifest'

label_attribute ='bounding-box'

open(local_path + cl_manifest_file, 'w').close()

# class representing a Custom Label JSON line for an image
class cl_json_line:  
    def __init__(self,job, img):  

        #Get image info. Annotations are dealt with seperately
        sizes=[]
        image_size={}
        image_size["width"] = img["width"]
        image_size["depth"] = 3
        image_size["height"] = img["height"]
        sizes.append(image_size)

        bounding_box={}
        bounding_box["annotations"] = []
        bounding_box["image_size"] = sizes

        self.__dict__["source-ref"] = s3_path + img['file_name']
        self.__dict__[job] = bounding_box

        #get metadata
        metadata = {}
        metadata['job-name'] = job_name
        metadata['class-map'] = {}
        metadata['human-annotated']='yes'
        metadata['objects'] = [] 
        date_time_obj = datetime.datetime.strptime(img['date_captured'], '%Y-%m-%d %H:%M:%S')
        metadata['creation-date']= date_time_obj.strftime('%Y-%m-%dT%H:%M:%S') 
        metadata['type']='groundtruth/object-detection'
        
        self.__dict__[job + '-metadata'] = metadata


print("Getting image, annotations, and categories from COCO file...")

with open(coco_json_file) as f:

    #Get custom label compatible info    
    js = json.load(f)
    images = js['images']
    categories = js['categories']
    annotations = js['annotations']

    print('Images: ' + str(len(images)))
    print('annotations: ' + str(len(annotations)))
    print('categories: ' + str(len (categories)))


print("Creating CL JSON lines...")
    
images_dict = {image['id']: cl_json_line(label_attribute, image) for image in images}

print('Parsing annotations...')
for annotation in annotations:

    image=images_dict[annotation['image_id']]

    cl_annotation = {}
    cl_class_map={}

    # get bounding box information
    cl_bounding_box={}
    cl_bounding_box['left'] = annotation['bbox'][0]
    cl_bounding_box['top'] = annotation['bbox'][1]
 
    cl_bounding_box['width'] = annotation['bbox'][2]
    cl_bounding_box['height'] = annotation['bbox'][3]
    cl_bounding_box['class_id'] = annotation['category_id']

    getattr(image, label_attribute)['annotations'].append(cl_bounding_box)


    for category in categories:
         if annotation['category_id'] == category['id']:
            getattr(image, label_attribute + '-metadata')['class-map'][category['id']]=category['name']
        
    
    cl_object={}
    cl_object['confidence'] = int(1)  #not currently used by Custom Labels
    getattr(image, label_attribute + '-metadata')['objects'].append(cl_object)

print('Done parsing annotations')

# Create manifest file.
print('Writing Custom Labels manifest...')

for im in images_dict.values():

    with open(local_path+cl_manifest_file, 'a+') as outfile:
            json.dump(im.__dict__,outfile)
            outfile.write('\n')
            outfile.close()

# Upload manifest file to S3 bucket.
print ('Uploading Custom Labels manifest file to S3 bucket')
print('Uploading'  + local_path + cl_manifest_file + ' to ' + s3_key_path_manifest_file)
print(s3_bucket)
s3 = boto3.resource('s3')
s3.Bucket(s3_bucket).upload_file(local_path + cl_manifest_file, s3_key_path_manifest_file + cl_manifest_file)

# Print S3 URL to manifest file,
print ('S3 URL Path to manifest file. ')
print('\033[1m s3://' + s3_bucket + '/' + s3_key_path_manifest_file + cl_manifest_file + '\033[0m') 

# Display aws s3 sync command.
print ('\nAWS CLI s3 sync command to upload your images to S3 bucket. ')
print ('\033[1m aws s3 sync ' + local_images_path + ' ' + s3_path + '\033[0m')

Jalankan kode tersebut.
Dalam output program, perhatikan s3 sync perintahnya. Anda membutuhkannya di langkah berikutnya.
Pada prompt perintah, jalankan s3 sync perintah. Gambar Anda diunggah ke bucket S3. Jika perintah gagal selama upload, jalankan lagi hingga gambar lokal Anda disinkronkan dengan bucket S3.
Dalam output program, perhatikan jalur URL S3 ke file manifes. Anda membutuhkannya di langkah berikutnya.
Ikuti instruksi di Membuat kumpulan data dengan file manifes SageMaker AI Ground Truth (Console) untuk membuat kumpulan data dengan file manifes yang diunggah. Untuk langkah 8, di lokasi file.manifest, masukkan URL Amazon S3 yang Anda catat di langkah sebelumnya. Jika Anda menggunakan AWS SDK, lakukanMembuat kumpulan data dengan file manifes SageMaker AI Ground Truth (SDK).

Awas Javascript dinonaktifkan atau tidak tersedia di browser Anda.

Untuk menggunakan Dokumentasi AWS, Javascript harus diaktifkan. Lihat halaman Bantuan browser Anda untuk petunjuk.

Konvensi Dokumen

Format dataset COCO

Mengubah file manifes Ground Truth multi-label