Trasformazione di un set di dati COCO

Usare il seguente esempio di Python per trasformare le informazioni del riquadro di delimitazione da un set di dati in formato COCO in un file manifest di Amazon Rekognition Custom Labels. Il codice carica il file manifest creato nel bucket Amazon S3. Il codice fornisce anche un comando AWS CLI che è possibile usare per caricare le immagini.

Trasformare un set di dati COCO (SDK)

Se non lo hai già fatto:
1. Accertarsi di avere le seguenti autorizzazioni AmazonS3FullAccess. Per ulteriori informazioni, consulta Impostare le autorizzazioni dell’SDK.
2. Installa e configura il AWS CLI e il. AWS SDKs Per ulteriori informazioni, consulta Passaggio 4: configura e AWS CLIAWS SDKs.

Usare il seguente codice Python per trasformare un set di dati COCO. Impostare i seguenti valori:

s3_bucket: il nome del bucket S3 in cui archiviare le immagini e il file manifest Amazon Rekognition Custom Labels.
s3_key_path_images: il percorso in cui si desiderano posizionare le immagini all'interno del bucket S3 (s3_bucket).
s3_key_path_manifest_file: il percorso in cui si desidera inserire il file manifest Custom Labels all'interno del bucket S3 (s3_bucket).
local_path: il percorso locale in cui l'esempio apre il set di dati COCO di input e salva anche il nuovo file manifest Custom Labels.
local_images_path: il percorso locale delle immagini che si desidera utilizzare per l'addestramento.
coco_manifest: il nome del file set di dati COCO di input.
cl_manifest_file: un nome per il file manifest creato dall'esempio. Il file viene salvato nella posizione specificata in local_path. Per convenzione, il file ha l'estensione.manifest, ma questa non è obbligatoria.
job_name: un nome per il lavoro Custom Labels.


import json
import os
import random
import shutil
import datetime
import botocore
import boto3
import PIL.Image as Image
import io

#S3 location for images
s3_bucket = 'bucket'
s3_key_path_manifest_file = 'path to custom labels manifest file/'
s3_key_path_images = 'path to images/'
s3_path='s3://' + s3_bucket  + '/' + s3_key_path_images
s3 = boto3.resource('s3')

#Local file information
local_path='path to input COCO dataset and output Custom Labels manifest/'
local_images_path='path to COCO images/'
coco_manifest = 'COCO dataset JSON file name'
coco_json_file = local_path + coco_manifest
job_name='Custom Labels job name'
cl_manifest_file = 'custom_labels.manifest'

label_attribute ='bounding-box'

open(local_path + cl_manifest_file, 'w').close()

# class representing a Custom Label JSON line for an image
class cl_json_line:  
    def __init__(self,job, img):  

        #Get image info. Annotations are dealt with seperately
        sizes=[]
        image_size={}
        image_size["width"] = img["width"]
        image_size["depth"] = 3
        image_size["height"] = img["height"]
        sizes.append(image_size)

        bounding_box={}
        bounding_box["annotations"] = []
        bounding_box["image_size"] = sizes

        self.__dict__["source-ref"] = s3_path + img['file_name']
        self.__dict__[job] = bounding_box

        #get metadata
        metadata = {}
        metadata['job-name'] = job_name
        metadata['class-map'] = {}
        metadata['human-annotated']='yes'
        metadata['objects'] = [] 
        date_time_obj = datetime.datetime.strptime(img['date_captured'], '%Y-%m-%d %H:%M:%S')
        metadata['creation-date']= date_time_obj.strftime('%Y-%m-%dT%H:%M:%S') 
        metadata['type']='groundtruth/object-detection'
        
        self.__dict__[job + '-metadata'] = metadata


print("Getting image, annotations, and categories from COCO file...")

with open(coco_json_file) as f:

    #Get custom label compatible info    
    js = json.load(f)
    images = js['images']
    categories = js['categories']
    annotations = js['annotations']

    print('Images: ' + str(len(images)))
    print('annotations: ' + str(len(annotations)))
    print('categories: ' + str(len (categories)))


print("Creating CL JSON lines...")
    
images_dict = {image['id']: cl_json_line(label_attribute, image) for image in images}

print('Parsing annotations...')
for annotation in annotations:

    image=images_dict[annotation['image_id']]

    cl_annotation = {}
    cl_class_map={}

    # get bounding box information
    cl_bounding_box={}
    cl_bounding_box['left'] = annotation['bbox'][0]
    cl_bounding_box['top'] = annotation['bbox'][1]
 
    cl_bounding_box['width'] = annotation['bbox'][2]
    cl_bounding_box['height'] = annotation['bbox'][3]
    cl_bounding_box['class_id'] = annotation['category_id']

    getattr(image, label_attribute)['annotations'].append(cl_bounding_box)


    for category in categories:
         if annotation['category_id'] == category['id']:
            getattr(image, label_attribute + '-metadata')['class-map'][category['id']]=category['name']
        
    
    cl_object={}
    cl_object['confidence'] = int(1)  #not currently used by Custom Labels
    getattr(image, label_attribute + '-metadata')['objects'].append(cl_object)

print('Done parsing annotations')

# Create manifest file.
print('Writing Custom Labels manifest...')

for im in images_dict.values():

    with open(local_path+cl_manifest_file, 'a+') as outfile:
            json.dump(im.__dict__,outfile)
            outfile.write('\n')
            outfile.close()

# Upload manifest file to S3 bucket.
print ('Uploading Custom Labels manifest file to S3 bucket')
print('Uploading'  + local_path + cl_manifest_file + ' to ' + s3_key_path_manifest_file)
print(s3_bucket)
s3 = boto3.resource('s3')
s3.Bucket(s3_bucket).upload_file(local_path + cl_manifest_file, s3_key_path_manifest_file + cl_manifest_file)

# Print S3 URL to manifest file,
print ('S3 URL Path to manifest file. ')
print('\033[1m s3://' + s3_bucket + '/' + s3_key_path_manifest_file + cl_manifest_file + '\033[0m') 

# Display aws s3 sync command.
print ('\nAWS CLI s3 sync command to upload your images to S3 bucket. ')
print ('\033[1m aws s3 sync ' + local_images_path + ' ' + s3_path + '\033[0m')

Eseguire il codice.
Di seguito è riportato un elenco dei comandi s3 sync. Questo valore servirà nella fase successiva.
Al prompt dei comandi, eseguire il comando s3 sync. Le immagini sono caricate nel bucket S3. Se il comando fallisce durante il caricamento, eseguirlo nuovamente finché le immagini locali non verranno sincronizzate con il bucket S3.
Nell'output del programma, osservare il percorso URL S3 del file manifest. Questo valore servirà nella fase successiva.
Seguire le istruzioni fornite in Creazione di un set di dati con un file manifest SageMaker AI Ground Truth (Console) per creare un set di dati con il file manifest caricato. Per il passaggio 8, nella posizione del file.manifest, inserire l'URL di Amazon S3 annotato nel passaggio precedente. Se si usa l' AWS SDK, fare Creazione di un set di dati con un file manifest SageMaker AI Ground Truth (SDK).

Avvertimento JavaScript è disabilitato o non è disponibile nel tuo browser.

Per usare la documentazione AWS, JavaScript deve essere abilitato. Consulta le pagine della guida del browser per le istruzioni.

Convenzioni dei documenti

Il formato del set di dati COCO

Trasformazione dei file manifest Ground Truth multietichetta