Rilevamento del testo in un video archiviato

Il rilevamento del testo di Video Amazon Rekognition nei video archiviati è un'operazione asincrona. Per iniziare a rilevare il testo, chiama. StartTextDetection Amazon Rekognition per video pubblica lo stato di completamento dell'analisi video in un argomento Amazon SNS. Se l'analisi video ha esito positivo, chiama GetTextDetection per ottenere i risultati dell'analisi. Per ulteriori informazioni su come avviare analisi video e ottenere i risultati, consultare Chiamata delle operazioni Video Amazon Rekognition.

Questa procedura espande il codice in Analisi di un video archiviato in un bucket Amazon S3 con Java o Python (SDK). Utilizza una coda Amazon SQS per ottenere lo stato di completamento di una richiesta di analisi video.

Per rilevare testo in un video archiviato in un bucket Amazon S3 (SDK)

Segui le fasi in Analisi di un video archiviato in un bucket Amazon S3 con Java o Python (SDK).

Aggiungere il codice seguente alla classe VideoDetect nel passaggio 1.

Java


//Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
//PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-developer-guide/blob/master/LICENSE-SAMPLECODE.)


private static void StartTextDetection(String bucket, String video) throws Exception{
           
    NotificationChannel channel= new NotificationChannel()
            .withSNSTopicArn(snsTopicArn)
            .withRoleArn(roleArn);
    
    StartTextDetectionRequest req = new StartTextDetectionRequest()
            .withVideo(new Video()
                    .withS3Object(new S3Object()
                        .withBucket(bucket)
                        .withName(video)))
            .withNotificationChannel(channel);
    
    
    StartTextDetectionResult startTextDetectionResult = rek.startTextDetection(req);
    startJobId=startTextDetectionResult.getJobId();
    
} 

private static void GetTextDetectionResults() throws Exception{
    
    int maxResults=10;
    String paginationToken=null;
    GetTextDetectionResult textDetectionResult=null;
    
    do{
        if (textDetectionResult !=null){
            paginationToken = textDetectionResult.getNextToken();

        }
        
    
        textDetectionResult = rek.getTextDetection(new GetTextDetectionRequest()
             .withJobId(startJobId)
             .withNextToken(paginationToken)
             .withMaxResults(maxResults));
    
        VideoMetadata videoMetaData=textDetectionResult.getVideoMetadata();
            
        System.out.println("Format: " + videoMetaData.getFormat());
        System.out.println("Codec: " + videoMetaData.getCodec());
        System.out.println("Duration: " + videoMetaData.getDurationMillis());
        System.out.println("FrameRate: " + videoMetaData.getFrameRate());
            
            
        //Show text, confidence values
        List<TextDetectionResult> textDetections = textDetectionResult.getTextDetections();


        for (TextDetectionResult text: textDetections) {
            long seconds=text.getTimestamp()/1000;
            System.out.println("Sec: " + Long.toString(seconds) + " ");
            TextDetection detectedText=text.getTextDetection();
            
            System.out.println("Text Detected: " + detectedText.getDetectedText());
                System.out.println("Confidence: " + detectedText.getConfidence().toString());
                System.out.println("Id : " + detectedText.getId());
                System.out.println("Parent Id: " + detectedText.getParentId());
                System.out.println("Bounding Box" + detectedText.getGeometry().getBoundingBox().toString());
                System.out.println("Type: " + detectedText.getType());
                System.out.println();
        }
    } while (textDetectionResult !=null && textDetectionResult.getNextToken() != null);
      
        
}

Nella funzione main, sostituisci le righe:


        StartLabelDetection(amzn-s3-demo-bucket, video);

        if (GetSQSMessageSuccess()==true)
        	GetLabelDetectionResults();

con:


        StartTextDetection(amzn-s3-demo-bucket, video);

        if (GetSQSMessageSuccess()==true)
        	GetTextDetectionResults();

Java V2

Questo codice è tratto dal repository degli esempi GitHub di AWS Documentation SDK. Guarda l'esempio completo qui.


//snippet-start:[rekognition.java2.recognize_video_text.import]
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.rekognition.RekognitionClient;
import software.amazon.awssdk.services.rekognition.model.S3Object;
import software.amazon.awssdk.services.rekognition.model.NotificationChannel;
import software.amazon.awssdk.services.rekognition.model.Video;
import software.amazon.awssdk.services.rekognition.model.StartTextDetectionRequest;
import software.amazon.awssdk.services.rekognition.model.StartTextDetectionResponse;
import software.amazon.awssdk.services.rekognition.model.RekognitionException;
import software.amazon.awssdk.services.rekognition.model.GetTextDetectionResponse;
import software.amazon.awssdk.services.rekognition.model.GetTextDetectionRequest;
import software.amazon.awssdk.services.rekognition.model.VideoMetadata;
import software.amazon.awssdk.services.rekognition.model.TextDetectionResult;
import java.util.List;
//snippet-end:[rekognition.java2.recognize_video_text.import]

/**
* Before running this Java V2 code example, set up your development environment, including your credentials.
*
* For more information, see the following documentation topic:
*
* https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html
*/
public class DetectTextVideo {

 private static String startJobId ="";
 public static void main(String[] args) {

     final String usage = "\n" +
         "Usage: " +
         "   <bucket> <video> <topicArn> <roleArn>\n\n" +
         "Where:\n" +
         "   bucket - The name of the bucket in which the video is located (for example, (for example, amzn-s3-demo-bucket). \n\n"+
         "   video - The name of video (for example, people.mp4). \n\n" +
         "   topicArn - The ARN of the Amazon Simple Notification Service (Amazon SNS) topic. \n\n" +
         "   roleArn - The ARN of the AWS Identity and Access Management (IAM) role to use. \n\n" ;

     if (args.length != 4) {
         System.out.println(usage);
         System.exit(1);
     }

     String bucket = args[0];
     String video = args[1];
     String topicArn = args[2];
     String roleArn = args[3];

     Region region = Region.US_EAST_1;
     RekognitionClient rekClient = RekognitionClient.builder()
         .region(region)
         .credentialsProvider(ProfileCredentialsProvider.create("profile-name"))
         .build();

     NotificationChannel channel = NotificationChannel.builder()
         .snsTopicArn(topicArn)
         .roleArn(roleArn)
         .build();

     startTextLabels(rekClient, channel, bucket, video);
     GetTextResults(rekClient);
     System.out.println("This example is done!");
     rekClient.close();
 }

 // snippet-start:[rekognition.java2.recognize_video_text.main]
 public static void startTextLabels(RekognitionClient rekClient,
                                NotificationChannel channel,
                                String bucket,
                                String video) {
     try {
         S3Object s3Obj = S3Object.builder()
             .bucket(bucket)
             .name(video)
             .build();

         Video vidOb = Video.builder()
             .s3Object(s3Obj)
             .build();

         StartTextDetectionRequest labelDetectionRequest = StartTextDetectionRequest.builder()
             .jobTag("DetectingLabels")
             .notificationChannel(channel)
             .video(vidOb)
             .build();

         StartTextDetectionResponse labelDetectionResponse = rekClient.startTextDetection(labelDetectionRequest);
         startJobId = labelDetectionResponse.jobId();

     } catch (RekognitionException e) {
         System.out.println(e.getMessage());
         System.exit(1);
     }
 }

 public static void GetTextResults(RekognitionClient rekClient) {

     try {
         String paginationToken=null;
         GetTextDetectionResponse textDetectionResponse=null;
         boolean finished = false;
         String status;
         int yy=0 ;

         do{
             if (textDetectionResponse !=null)
                 paginationToken = textDetectionResponse.nextToken();

             GetTextDetectionRequest recognitionRequest = GetTextDetectionRequest.builder()
                 .jobId(startJobId)
                 .nextToken(paginationToken)
                 .maxResults(10)
                 .build();

             // Wait until the job succeeds.
             while (!finished) {
                 textDetectionResponse = rekClient.getTextDetection(recognitionRequest);
                 status = textDetectionResponse.jobStatusAsString();

                 if (status.compareTo("SUCCEEDED") == 0)
                     finished = true;
                 else {
                     System.out.println(yy + " status is: " + status);
                     Thread.sleep(1000);
                 }
                 yy++;
             }

             finished = false;

             // Proceed when the job is done - otherwise VideoMetadata is null.
             VideoMetadata videoMetaData=textDetectionResponse.videoMetadata();
             System.out.println("Format: " + videoMetaData.format());
             System.out.println("Codec: " + videoMetaData.codec());
             System.out.println("Duration: " + videoMetaData.durationMillis());
             System.out.println("FrameRate: " + videoMetaData.frameRate());
             System.out.println("Job");

             List<TextDetectionResult> labels= textDetectionResponse.textDetections();
             for (TextDetectionResult detectedText: labels) {
                 System.out.println("Confidence: " + detectedText.textDetection().confidence().toString());
                 System.out.println("Id : " + detectedText.textDetection().id());
                 System.out.println("Parent Id: " + detectedText.textDetection().parentId());
                 System.out.println("Type: " + detectedText.textDetection().type());
                 System.out.println("Text: " + detectedText.textDetection().detectedText());
                 System.out.println();
             }

         } while (textDetectionResponse !=null && textDetectionResponse.nextToken() != null);

     } catch(RekognitionException | InterruptedException e) {
         System.out.println(e.getMessage());
         System.exit(1);
     }
 }
 // snippet-end:[rekognition.java2.recognize_video_text.main]
}

Python


#Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-developer-guide/blob/master/LICENSE-SAMPLECODE.)

    def StartTextDetection(self):
        response=self.rek.start_text_detection(Video={'S3Object': {'Bucket': self.bucket, 'Name': self.video}},
            NotificationChannel={'RoleArn': self.roleArn, 'SNSTopicArn': self.snsTopicArn})

        self.startJobId=response['JobId']
        print('Start Job Id: ' + self.startJobId)
  
    def GetTextDetectionResults(self):
        maxResults = 10
        paginationToken = ''
        finished = False

        while finished == False:
            response = self.rek.get_text_detection(JobId=self.startJobId,
                                            MaxResults=maxResults,
                                            NextToken=paginationToken)

            print('Codec: ' + response['VideoMetadata']['Codec'])
            
            print('Duration: ' + str(response['VideoMetadata']['DurationMillis']))
            print('Format: ' + response['VideoMetadata']['Format'])
            print('Frame rate: ' + str(response['VideoMetadata']['FrameRate']))
            print()

            for textDetection in response['TextDetections']:
                text=textDetection['TextDetection']

                print("Timestamp: " + str(textDetection['Timestamp']))
                print("   Text Detected: " + text['DetectedText'])
                print("   Confidence: " +  str(text['Confidence']))
                print ("      Bounding box")
                print ("        Top: " + str(text['Geometry']['BoundingBox']['Top']))
                print ("        Left: " + str(text['Geometry']['BoundingBox']['Left']))
                print ("        Width: " +  str(text['Geometry']['BoundingBox']['Width']))
                print ("        Height: " +  str(text['Geometry']['BoundingBox']['Height']))
                print ("   Type: " + str(text['Type']) )
                print()

            if 'NextToken' in response:
                paginationToken = response['NextToken']
            else:
                finished = True

Nella funzione main, sostituisci le righe:


    analyzer.StartLabelDetection()
    if analyzer.GetSQSMessageSuccess()==True:
        analyzer.GetLabelDetectionResults()

con:


    analyzer.StartTextDetection()
    if analyzer.GetSQSMessageSuccess()==True:
        analyzer.GetTextDetectionResults()

CLI

Esegui il AWS CLI comando seguente per iniziare a rilevare il testo in un video.


 aws rekognition start-text-detection --video "{"S3Object":{"Bucket":"amzn-s3-demo-bucket","Name":"video-name"}}"\
 --notification-channel "{"SNSTopicArn":"topic-arn","RoleArn":"role-arn"}" \
 --region region-name --profile profile-name

Aggiorna i seguenti valori:

Modifica amzn-s3-demo-bucket e video-name con il nome del bucket Amazon S3 e il nome del file specificati nella fase 2.
Cambia region-name con la regione AWS che stai utilizzando.
Sostituisci il valore di profile-name con il nome del tuo profilo di sviluppatore.
Cambia topic-ARN con l'ARN dell'argomento Amazon SNS creato nella fase 3 di Configurazione di Video Amazon Rekognition.
Modifica role-ARN con l'ARN del ruolo di servizio IAM creato nella fase 7 di Configurazione di Video Amazon Rekognition.

Se accedi alla CLI da un dispositivo Windows, usa le virgolette doppie anziché le virgolette singole ed evita le virgolette doppie interne tramite barra rovesciata (ovvero, \) per risolvere eventuali errori del parser che potresti riscontrare. Un esempio è fornito di seguito:


aws rekognition start-text-detection --video \
 "{\"S3Object\":{\"Bucket\":\"amzn-s3-demo-bucket\",\"Name\":\"video-name\"}}" \
 --notification-channel "{\"SNSTopicArn\":\"topic-arn\",\"RoleArn\":\"role-arn\"}" \
 --region region-name --profile profile-name

Dopo aver eseguito l'esempio di codice precedente, copia il jobID restituito e inseriscilo nel comando GetTextDetection di seguito per ottenere i risultati, sostituendo job-id-number con il jobID ricevuto in precedenza:


aws rekognition get-text-detection --job-id job-id-number --profile profile-name

Nota

Se hai già eseguito un video di esempio diverso da Analisi di un video archiviato in un bucket Amazon S3 con Java o Python (SDK), il codice da sostituire potrebbe essere diverso.

Eseguire il codice. Il testo rilevato nel video viene visualizzato in un elenco.

Filtri

I filtri sono parametri di richiesta facoltativi che possono essere utilizzati quando si chiama StartTextDetection. La possibilità di filtrare per regione del testo, dimensione e punteggio di affidabilità offre la flessibilità necessaria a controllare l'output del rilevamento di testo. Usando le regioni di interesse è possibile limitare facilmente il rilevamento del testo alle regioni desiderate, per esempio, la terza regione dal basso per le grafiche o la regione nell'angolo in alto a sinistra per leggere i tabelloni dei punteggi in una partita di calcio. Il filtro relativo alla dimensione del riquadro di delimitazione del testo può essere utilizzato per evitare testo troppo piccolo sullo sfondo che può risultare un disturbo o irrilevante. Infine, il filtro di affidabilità delle parole ti permette di rimuovere i risultati inaffidabili perché sfocati o sbavati.

Per informazioni sui valori dei filtri, consulta DetectTextFilters.

Puoi utilizzare i filtri seguenti:

MinConfidence—Imposta il livello di confidenza del rilevamento delle parole. Le parole con sicurezza di rilevamento al di sotto di questo livello sono escluse dal risultato. I valori dovrebbero essere compresi tra 0 e 100.
MinBoundingBoxWidth— Imposta la larghezza minima del riquadro di delimitazione delle parole. Le parole con caselle di delimitazione inferiori a questo valore sono escluse dal risultato. Il valore è relativo alla larghezza del fotogramma video.
MinBoundingBoxHeight— Imposta l'altezza minima del riquadro di delimitazione delle parole. Le parole con altezze dei box di delimitazione inferiori a questo valore sono escluse dal risultato. Il valore è relativo all'altezza del fotogramma video.
RegionsOfInterest— Limita il rilevamento a una regione specifica del riquadro. I valori sono relativi alle dimensioni del fotogramma. Per gli oggetti solo parzialmente all'interno delle regioni, la risposta non è definita.

GetTextDetection risposta

GetTextDetection restituisce una matrice (TextDetectionResults) che contiene informazioni sul testo rilevato nel video. Esiste un elemento della matrice, TextDetection, per ogni rilevamento di parola o riga nel video. Gli elementi della matrice sono ordinati in base al tempo, espresso in millisecondi, dall'inizio del video.

Di seguito è riportato un esempio di risposta JSON parziale di GetTextDetection. Nella risposta, tenere presente quanto segue:

Informazioni di testo: l'elemento dell'TextDetectionResultarray contiene informazioni sul testo rilevato (TextDetection) e sull'ora in cui il testo è stato rilevato nel video (Timestamp).
Informazioni di paginazione: L'esempio illustra una pagina di informazioni di rilevamento del testo. Puoi specificare il numero di elementi della persona da restituire nel parametro di input MaxResults per GetTextDetection. Se esistono più risultati rispetto a MaxResults o ci sono più risultati rispetto al massimo predefinito, GetTextDetection restituisce un token (NextToken) utilizzato per ottenere la pagina successiva dei risultati. Per ulteriori informazioni, consulta Ottenere i risultati dell'analisi di Video Amazon Rekognition.
Informazioni video - La risposta include informazioni sul formato video (VideoMetadata) in ogni pagina di informazioni restituita da GetTextDetection.



{
    "JobStatus": "SUCCEEDED",
    "VideoMetadata": {
        "Codec": "h264",
        "DurationMillis": 174441,
        "Format": "QuickTime / MOV",
        "FrameRate": 29.970029830932617,
        "FrameHeight": 480,
        "FrameWidth": 854
    },
    "TextDetections": [
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Twinkle Twinkle Little Star",
                "Type": "LINE",
                "Id": 0,
                "Confidence": 99.91780090332031,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.8337579369544983,
                        "Height": 0.08365312218666077,
                        "Left": 0.08313830941915512,
                        "Top": 0.4663468301296234
                    },
                    "Polygon": [
                        {
                            "X": 0.08313830941915512,
                            "Y": 0.4663468301296234
                        },
                        {
                            "X": 0.9168962240219116,
                            "Y": 0.4674469828605652
                        },
                        {
                            "X": 0.916861355304718,
                            "Y": 0.5511001348495483
                        },
                        {
                            "X": 0.08310343325138092,
                            "Y": 0.5499999523162842
                        }
                    ]
                }
            }
        },
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Twinkle",
                "Type": "WORD",
                "Id": 1,
                "ParentId": 0,
                "Confidence": 99.98338317871094,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.2423887550830841,
                        "Height": 0.0833333358168602,
                        "Left": 0.08313817530870438,
                        "Top": 0.46666666865348816
                    },
                    "Polygon": [
                        {
                            "X": 0.08313817530870438,
                            "Y": 0.46666666865348816
                        },
                        {
                            "X": 0.3255269229412079,
                            "Y": 0.46666666865348816
                        },
                        {
                            "X": 0.3255269229412079,
                            "Y": 0.550000011920929
                        },
                        {
                            "X": 0.08313817530870438,
                            "Y": 0.550000011920929
                        }
                    ]
                }
            }
        },
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Twinkle",
                "Type": "WORD",
                "Id": 2,
                "ParentId": 0,
                "Confidence": 99.982666015625,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.2423887550830841,
                        "Height": 0.08124999701976776,
                        "Left": 0.3454332649707794,
                        "Top": 0.46875
                    },
                    "Polygon": [
                        {
                            "X": 0.3454332649707794,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.5878220200538635,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.5878220200538635,
                            "Y": 0.550000011920929
                        },
                        {
                            "X": 0.3454332649707794,
                            "Y": 0.550000011920929
                        }
                    ]
                }
            }
        },
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Little",
                "Type": "WORD",
                "Id": 3,
                "ParentId": 0,
                "Confidence": 99.8787612915039,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.16627635061740875,
                        "Height": 0.08124999701976776,
                        "Left": 0.6053864359855652,
                        "Top": 0.46875
                    },
                    "Polygon": [
                        {
                            "X": 0.6053864359855652,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.7716627717018127,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.7716627717018127,
                            "Y": 0.550000011920929
                        },
                        {
                            "X": 0.6053864359855652,
                            "Y": 0.550000011920929
                        }
                    ]
                }
            }
        },
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Star",
                "Type": "WORD",
                "Id": 4,
                "ParentId": 0,
                "Confidence": 99.82640075683594,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.12997658550739288,
                        "Height": 0.08124999701976776,
                        "Left": 0.7868852615356445,
                        "Top": 0.46875
                    },
                    "Polygon": [
                        {
                            "X": 0.7868852615356445,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.9168618321418762,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.9168618321418762,
                            "Y": 0.550000011920929
                        },
                        {
                            "X": 0.7868852615356445,
                            "Y": 0.550000011920929
                        }
                    ]
                }
            }
        }
    ],
    "NextToken": "NiHpGbZFnkM/S8kLcukMni15wb05iKtquu/Mwc+Qg1LVlMjjKNOD0Z0GusSPg7TONLe+OZ3P",
    "TextModelVersion": "3.0"
}

Avvertimento JavaScript è disabilitato o non è disponibile nel tuo browser.

Per usare la documentazione AWS, JavaScript deve essere abilitato. Consulta le pagine della guida del browser per le istruzioni.

Convenzioni dei documenti

Rilevamento di testo in un'immagine

Rilevamento di segmenti video