偵測已儲存影片中的文字

Amazon Rekognition Video 已儲存影片中的文字偵測是一項非同步操作。要開始偵測文字，請呼叫 StartTextDetection。Amazon Rekognition Video 會將影片分析的完成状态发布至 Amazon SNS 主题。如果影片分析成功，請呼叫 GetTextDetection 以取得分析結果。如需開始影片分析並取得結果的詳細資訊，請參閱呼叫 Amazon Rekognition Video 操作。

此程序會在使用 Java 或 Python (SDK) 分析儲存於 Amazon S3 儲存貯體中的影片中展開程式碼。此程序使用 Amazon SQS 佇列來獲取影片分析要求的完成狀態。

偵測存放於 Amazon S3 儲存貯體 (SDK) 之影片中的文字

請執行使用 Java 或 Python (SDK) 分析儲存於 Amazon S3 儲存貯體中的影片中的步驟。

將下列程式碼加入至步驟 1 中的類別 VideoDetect。

Java


//Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
//PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-developer-guide/blob/master/LICENSE-SAMPLECODE.)


private static void StartTextDetection(String bucket, String video) throws Exception{
           
    NotificationChannel channel= new NotificationChannel()
            .withSNSTopicArn(snsTopicArn)
            .withRoleArn(roleArn);
    
    StartTextDetectionRequest req = new StartTextDetectionRequest()
            .withVideo(new Video()
                    .withS3Object(new S3Object()
                        .withBucket(bucket)
                        .withName(video)))
            .withNotificationChannel(channel);
    
    
    StartTextDetectionResult startTextDetectionResult = rek.startTextDetection(req);
    startJobId=startTextDetectionResult.getJobId();
    
} 

private static void GetTextDetectionResults() throws Exception{
    
    int maxResults=10;
    String paginationToken=null;
    GetTextDetectionResult textDetectionResult=null;
    
    do{
        if (textDetectionResult !=null){
            paginationToken = textDetectionResult.getNextToken();

        }
        
    
        textDetectionResult = rek.getTextDetection(new GetTextDetectionRequest()
             .withJobId(startJobId)
             .withNextToken(paginationToken)
             .withMaxResults(maxResults));
    
        VideoMetadata videoMetaData=textDetectionResult.getVideoMetadata();
            
        System.out.println("Format: " + videoMetaData.getFormat());
        System.out.println("Codec: " + videoMetaData.getCodec());
        System.out.println("Duration: " + videoMetaData.getDurationMillis());
        System.out.println("FrameRate: " + videoMetaData.getFrameRate());
            
            
        //Show text, confidence values
        List<TextDetectionResult> textDetections = textDetectionResult.getTextDetections();


        for (TextDetectionResult text: textDetections) {
            long seconds=text.getTimestamp()/1000;
            System.out.println("Sec: " + Long.toString(seconds) + " ");
            TextDetection detectedText=text.getTextDetection();
            
            System.out.println("Text Detected: " + detectedText.getDetectedText());
                System.out.println("Confidence: " + detectedText.getConfidence().toString());
                System.out.println("Id : " + detectedText.getId());
                System.out.println("Parent Id: " + detectedText.getParentId());
                System.out.println("Bounding Box" + detectedText.getGeometry().getBoundingBox().toString());
                System.out.println("Type: " + detectedText.getType());
                System.out.println();
        }
    } while (textDetectionResult !=null && textDetectionResult.getNextToken() != null);
      
        
}

在函數 main 中，將下行:


        StartLabelDetection(amzn-s3-demo-bucket, video);

        if (GetSQSMessageSuccess()==true)
        	GetLabelDetectionResults();

取代為：


        StartTextDetection(amzn-s3-demo-bucket, video);

        if (GetSQSMessageSuccess()==true)
        	GetTextDetectionResults();

Java V2

此程式碼取自 AWS 文件開發套件範例 GitHub 儲存庫。請參閱此處的完整範例。


//snippet-start:[rekognition.java2.recognize_video_text.import]
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.rekognition.RekognitionClient;
import software.amazon.awssdk.services.rekognition.model.S3Object;
import software.amazon.awssdk.services.rekognition.model.NotificationChannel;
import software.amazon.awssdk.services.rekognition.model.Video;
import software.amazon.awssdk.services.rekognition.model.StartTextDetectionRequest;
import software.amazon.awssdk.services.rekognition.model.StartTextDetectionResponse;
import software.amazon.awssdk.services.rekognition.model.RekognitionException;
import software.amazon.awssdk.services.rekognition.model.GetTextDetectionResponse;
import software.amazon.awssdk.services.rekognition.model.GetTextDetectionRequest;
import software.amazon.awssdk.services.rekognition.model.VideoMetadata;
import software.amazon.awssdk.services.rekognition.model.TextDetectionResult;
import java.util.List;
//snippet-end:[rekognition.java2.recognize_video_text.import]

/**
* Before running this Java V2 code example, set up your development environment, including your credentials.
*
* For more information, see the following documentation topic:
*
* https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html
*/
public class DetectTextVideo {

 private static String startJobId ="";
 public static void main(String[] args) {

     final String usage = "\n" +
         "Usage: " +
         "   <bucket> <video> <topicArn> <roleArn>\n\n" +
         "Where:\n" +
         "   bucket - The name of the bucket in which the video is located (for example, (for example, amzn-s3-demo-bucket). \n\n"+
         "   video - The name of video (for example, people.mp4). \n\n" +
         "   topicArn - The ARN of the Amazon Simple Notification Service (Amazon SNS) topic. \n\n" +
         "   roleArn - The ARN of the AWS Identity and Access Management (IAM) role to use. \n\n" ;

     if (args.length != 4) {
         System.out.println(usage);
         System.exit(1);
     }

     String bucket = args[0];
     String video = args[1];
     String topicArn = args[2];
     String roleArn = args[3];

     Region region = Region.US_EAST_1;
     RekognitionClient rekClient = RekognitionClient.builder()
         .region(region)
         .credentialsProvider(ProfileCredentialsProvider.create("profile-name"))
         .build();

     NotificationChannel channel = NotificationChannel.builder()
         .snsTopicArn(topicArn)
         .roleArn(roleArn)
         .build();

     startTextLabels(rekClient, channel, bucket, video);
     GetTextResults(rekClient);
     System.out.println("This example is done!");
     rekClient.close();
 }

 // snippet-start:[rekognition.java2.recognize_video_text.main]
 public static void startTextLabels(RekognitionClient rekClient,
                                NotificationChannel channel,
                                String bucket,
                                String video) {
     try {
         S3Object s3Obj = S3Object.builder()
             .bucket(bucket)
             .name(video)
             .build();

         Video vidOb = Video.builder()
             .s3Object(s3Obj)
             .build();

         StartTextDetectionRequest labelDetectionRequest = StartTextDetectionRequest.builder()
             .jobTag("DetectingLabels")
             .notificationChannel(channel)
             .video(vidOb)
             .build();

         StartTextDetectionResponse labelDetectionResponse = rekClient.startTextDetection(labelDetectionRequest);
         startJobId = labelDetectionResponse.jobId();

     } catch (RekognitionException e) {
         System.out.println(e.getMessage());
         System.exit(1);
     }
 }

 public static void GetTextResults(RekognitionClient rekClient) {

     try {
         String paginationToken=null;
         GetTextDetectionResponse textDetectionResponse=null;
         boolean finished = false;
         String status;
         int yy=0 ;

         do{
             if (textDetectionResponse !=null)
                 paginationToken = textDetectionResponse.nextToken();

             GetTextDetectionRequest recognitionRequest = GetTextDetectionRequest.builder()
                 .jobId(startJobId)
                 .nextToken(paginationToken)
                 .maxResults(10)
                 .build();

             // Wait until the job succeeds.
             while (!finished) {
                 textDetectionResponse = rekClient.getTextDetection(recognitionRequest);
                 status = textDetectionResponse.jobStatusAsString();

                 if (status.compareTo("SUCCEEDED") == 0)
                     finished = true;
                 else {
                     System.out.println(yy + " status is: " + status);
                     Thread.sleep(1000);
                 }
                 yy++;
             }

             finished = false;

             // Proceed when the job is done - otherwise VideoMetadata is null.
             VideoMetadata videoMetaData=textDetectionResponse.videoMetadata();
             System.out.println("Format: " + videoMetaData.format());
             System.out.println("Codec: " + videoMetaData.codec());
             System.out.println("Duration: " + videoMetaData.durationMillis());
             System.out.println("FrameRate: " + videoMetaData.frameRate());
             System.out.println("Job");

             List<TextDetectionResult> labels= textDetectionResponse.textDetections();
             for (TextDetectionResult detectedText: labels) {
                 System.out.println("Confidence: " + detectedText.textDetection().confidence().toString());
                 System.out.println("Id : " + detectedText.textDetection().id());
                 System.out.println("Parent Id: " + detectedText.textDetection().parentId());
                 System.out.println("Type: " + detectedText.textDetection().type());
                 System.out.println("Text: " + detectedText.textDetection().detectedText());
                 System.out.println();
             }

         } while (textDetectionResponse !=null && textDetectionResponse.nextToken() != null);

     } catch(RekognitionException | InterruptedException e) {
         System.out.println(e.getMessage());
         System.exit(1);
     }
 }
 // snippet-end:[rekognition.java2.recognize_video_text.main]
}

Python


#Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-developer-guide/blob/master/LICENSE-SAMPLECODE.)

    def StartTextDetection(self):
        response=self.rek.start_text_detection(Video={'S3Object': {'Bucket': self.bucket, 'Name': self.video}},
            NotificationChannel={'RoleArn': self.roleArn, 'SNSTopicArn': self.snsTopicArn})

        self.startJobId=response['JobId']
        print('Start Job Id: ' + self.startJobId)
  
    def GetTextDetectionResults(self):
        maxResults = 10
        paginationToken = ''
        finished = False

        while finished == False:
            response = self.rek.get_text_detection(JobId=self.startJobId,
                                            MaxResults=maxResults,
                                            NextToken=paginationToken)

            print('Codec: ' + response['VideoMetadata']['Codec'])
            
            print('Duration: ' + str(response['VideoMetadata']['DurationMillis']))
            print('Format: ' + response['VideoMetadata']['Format'])
            print('Frame rate: ' + str(response['VideoMetadata']['FrameRate']))
            print()

            for textDetection in response['TextDetections']:
                text=textDetection['TextDetection']

                print("Timestamp: " + str(textDetection['Timestamp']))
                print("   Text Detected: " + text['DetectedText'])
                print("   Confidence: " +  str(text['Confidence']))
                print ("      Bounding box")
                print ("        Top: " + str(text['Geometry']['BoundingBox']['Top']))
                print ("        Left: " + str(text['Geometry']['BoundingBox']['Left']))
                print ("        Width: " +  str(text['Geometry']['BoundingBox']['Width']))
                print ("        Height: " +  str(text['Geometry']['BoundingBox']['Height']))
                print ("   Type: " + str(text['Type']) )
                print()

            if 'NextToken' in response:
                paginationToken = response['NextToken']
            else:
                finished = True

在函數 main 中，將下行:


    analyzer.StartLabelDetection()
    if analyzer.GetSQSMessageSuccess()==True:
        analyzer.GetLabelDetectionResults()

取代為：


    analyzer.StartTextDetection()
    if analyzer.GetSQSMessageSuccess()==True:
        analyzer.GetTextDetectionResults()

CLI

執行下列 AWS CLI 命令，開始偵測影片中的文字。


 aws rekognition start-text-detection --video "{"S3Object":{"Bucket":"amzn-s3-demo-bucket","Name":"video-name"}}"\
 --notification-channel "{"SNSTopicArn":"topic-arn","RoleArn":"role-arn"}" \
 --region region-name --profile profile-name

更新下列的值：

將 amzn-s3-demo-bucket 與 video-name 變更為您在步驟 2 中指定的 Amazon S3 儲存貯體與文檔名稱。
將 region-name 變更為您正在使用的 AWS 區域。
使用您開發人員設定檔的名稱取代 profile-name 的值。
將 topic-ARN 變更為您在設定 Amazon Rekognition Video 步驟 3 建立的 Amazon SNS 主題 ARN。
將 role-ARN 變更為您在步驟 7 建立的設定 Amazon Rekognition Video IAM 服務角色的 ARN。

如果您在 Windows 裝置上存取 CLI，請使用雙引號而非單引號，並以反斜線 (即\) 替代內部雙引號，以解決您可能遇到的任何剖析器錯誤。如需範例，請參閱下列內容：


aws rekognition start-text-detection --video \
 "{\"S3Object\":{\"Bucket\":\"amzn-s3-demo-bucket\",\"Name\":\"video-name\"}}" \
 --notification-channel "{\"SNSTopicArn\":\"topic-arn\",\"RoleArn\":\"role-arn\"}" \
 --region region-name --profile profile-name

執行正在進行的程式碼範例之後，複製傳回的程式碼 jobID，並將其提供給下列 GetTextDetection 命令，以 job-id-number 取代您之前收到的 jobID 結果：


aws rekognition get-text-detection --job-id job-id-number --profile profile-name

注意

如果您已執行使用 Java 或 Python (SDK) 分析儲存於 Amazon S3 儲存貯體中的影片以外的影片範例，要取代的程式碼可能會不同。

執行程式碼。在影片中偵測到的文字會顯示在清單中。

篩選條件

篩選器是可選的要求參數，當您呼叫 StartTextDetection 時，會使用此篩選器。按文字區域、大小和可信度分數進行篩選可為您提供更大的靈活性來控製文字偵測輸出。透過感興趣的區域，您可以輕鬆地將文字偵測限制在與您相關的區域，例如，圖形的底部第三區域或足球遊戲中讀取記分牌的左上角。文字週框方塊大小篩選器可用於避免產生嘈雜或無關緊要的小背景文字。最後，文字可信度篩選器讓您可以消除因朦朧或模糊而導致的不可靠結果。

如需篩選器值的相關資訊，請參閱 DetectTextFilters。

您可以使用下列篩選器：

MinConfidence：設定文字偵測的可信度。可信度低於此等級的文字會從結果中排除。值應該介於 0 和 100 之間。
MinBoundingBoxWidth：設定文字週框方塊的最小寬度。週邊方塊小於此值的文字會從結果中排除。此值相對於影片影格寬度。
MinBoundingBoxHeight：設定文字週邊方塊的最小高度。週邊方塊高度小於此值的文字會從結果中排除。此值相對於影片影格高度。
RegionsOfInterest：將偵測限制為影格的特定區域。這些值是對於影格尺寸。對於僅部分區域內的物件，回應是未定義的。

GetTextDetection 回應

GetTextDetection 會傳回陣列 (TextDetectionResults)，其中包含影片中偵測到之文字的相關資訊。每當 TextDetection 在視訊中偵測到文字或文字行時，就會有一個陣列元素。陣列元素會從影片開頭起依時間 (以毫秒為單位) 排序。

以下是 GetTextDetection 的部分 JSON 回應。在回應中，請注意下列事項：

文字資訊：TextDetectionResult 陣列元素包含偵測到之文字的資訊 (TextDetection)，以及在影片中偵測到文字的時間 (Timestamp)。
分頁資訊：該範例顯示一頁文字偵測資訊。您可以在 GetTextDetection 的 MaxResults 輸入參數中指定要傳回的文字元素數目。如果結果數目超過 MaxResults，或結果數目超過預設最大值，GetTextDetection 會傳回用來取得下一頁結果的字符 (NextToken)。如需詳細資訊，請參閱取得 Amazon Rekognition Video 分析結果。
影片資訊：回應包含 VideoMetadata 所傳回之每頁資訊中影片格式 (GetTextDetection) 的相關資訊。



{
    "JobStatus": "SUCCEEDED",
    "VideoMetadata": {
        "Codec": "h264",
        "DurationMillis": 174441,
        "Format": "QuickTime / MOV",
        "FrameRate": 29.970029830932617,
        "FrameHeight": 480,
        "FrameWidth": 854
    },
    "TextDetections": [
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Twinkle Twinkle Little Star",
                "Type": "LINE",
                "Id": 0,
                "Confidence": 99.91780090332031,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.8337579369544983,
                        "Height": 0.08365312218666077,
                        "Left": 0.08313830941915512,
                        "Top": 0.4663468301296234
                    },
                    "Polygon": [
                        {
                            "X": 0.08313830941915512,
                            "Y": 0.4663468301296234
                        },
                        {
                            "X": 0.9168962240219116,
                            "Y": 0.4674469828605652
                        },
                        {
                            "X": 0.916861355304718,
                            "Y": 0.5511001348495483
                        },
                        {
                            "X": 0.08310343325138092,
                            "Y": 0.5499999523162842
                        }
                    ]
                }
            }
        },
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Twinkle",
                "Type": "WORD",
                "Id": 1,
                "ParentId": 0,
                "Confidence": 99.98338317871094,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.2423887550830841,
                        "Height": 0.0833333358168602,
                        "Left": 0.08313817530870438,
                        "Top": 0.46666666865348816
                    },
                    "Polygon": [
                        {
                            "X": 0.08313817530870438,
                            "Y": 0.46666666865348816
                        },
                        {
                            "X": 0.3255269229412079,
                            "Y": 0.46666666865348816
                        },
                        {
                            "X": 0.3255269229412079,
                            "Y": 0.550000011920929
                        },
                        {
                            "X": 0.08313817530870438,
                            "Y": 0.550000011920929
                        }
                    ]
                }
            }
        },
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Twinkle",
                "Type": "WORD",
                "Id": 2,
                "ParentId": 0,
                "Confidence": 99.982666015625,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.2423887550830841,
                        "Height": 0.08124999701976776,
                        "Left": 0.3454332649707794,
                        "Top": 0.46875
                    },
                    "Polygon": [
                        {
                            "X": 0.3454332649707794,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.5878220200538635,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.5878220200538635,
                            "Y": 0.550000011920929
                        },
                        {
                            "X": 0.3454332649707794,
                            "Y": 0.550000011920929
                        }
                    ]
                }
            }
        },
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Little",
                "Type": "WORD",
                "Id": 3,
                "ParentId": 0,
                "Confidence": 99.8787612915039,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.16627635061740875,
                        "Height": 0.08124999701976776,
                        "Left": 0.6053864359855652,
                        "Top": 0.46875
                    },
                    "Polygon": [
                        {
                            "X": 0.6053864359855652,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.7716627717018127,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.7716627717018127,
                            "Y": 0.550000011920929
                        },
                        {
                            "X": 0.6053864359855652,
                            "Y": 0.550000011920929
                        }
                    ]
                }
            }
        },
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Star",
                "Type": "WORD",
                "Id": 4,
                "ParentId": 0,
                "Confidence": 99.82640075683594,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.12997658550739288,
                        "Height": 0.08124999701976776,
                        "Left": 0.7868852615356445,
                        "Top": 0.46875
                    },
                    "Polygon": [
                        {
                            "X": 0.7868852615356445,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.9168618321418762,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.9168618321418762,
                            "Y": 0.550000011920929
                        },
                        {
                            "X": 0.7868852615356445,
                            "Y": 0.550000011920929
                        }
                    ]
                }
            }
        }
    ],
    "NextToken": "NiHpGbZFnkM/S8kLcukMni15wb05iKtquu/Mwc+Qg1LVlMjjKNOD0Z0GusSPg7TONLe+OZ3P",
    "TextModelVersion": "3.0"
}

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

在映像中偵測文字

偵測影片區段