偵測已儲存影片中的文字 - Amazon Rekognition

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

偵測已儲存影片中的文字

Amazon Rekognition Video 已儲存影片中的文字偵測是一項非同步操作。要開始檢測文本,請致電StartTextDetection。Amazon Rekognition Video 會將影片分析的完成状态发布至 Amazon SNS 主题。如果視頻分析成功,請致電GetTextDetection以獲取分析結果。如需開始影片分析並取得結果的詳細資訊,請參閱 呼叫 Amazon Rekognition Video 操作

此程序會在 使用 Java 或 Python (SDK) 分析儲存於 Amazon S3 儲存貯體中的影片 中展開程式碼。此程序使用 Amazon SQS 佇列來獲取影片分析要求的完成狀態。

偵測存放於 Amazon S3 儲存貯體 (SDK) 之影片中的文字
  1. 請執行 使用 Java 或 Python (SDK) 分析儲存於 Amazon S3 儲存貯體中的影片 中的步驟。

  2. 將下列程式碼加入至步驟 1 中的類別 VideoDetect

    Java
    //Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. //PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-developer-guide/blob/master/LICENSE-SAMPLECODE.) private static void StartTextDetection(String bucket, String video) throws Exception{ NotificationChannel channel= new NotificationChannel() .withSNSTopicArn(snsTopicArn) .withRoleArn(roleArn); StartTextDetectionRequest req = new StartTextDetectionRequest() .withVideo(new Video() .withS3Object(new S3Object() .withBucket(bucket) .withName(video))) .withNotificationChannel(channel); StartTextDetectionResult startTextDetectionResult = rek.startTextDetection(req); startJobId=startTextDetectionResult.getJobId(); } private static void GetTextDetectionResults() throws Exception{ int maxResults=10; String paginationToken=null; GetTextDetectionResult textDetectionResult=null; do{ if (textDetectionResult !=null){ paginationToken = textDetectionResult.getNextToken(); } textDetectionResult = rek.getTextDetection(new GetTextDetectionRequest() .withJobId(startJobId) .withNextToken(paginationToken) .withMaxResults(maxResults)); VideoMetadata videoMetaData=textDetectionResult.getVideoMetadata(); System.out.println("Format: " + videoMetaData.getFormat()); System.out.println("Codec: " + videoMetaData.getCodec()); System.out.println("Duration: " + videoMetaData.getDurationMillis()); System.out.println("FrameRate: " + videoMetaData.getFrameRate()); //Show text, confidence values List<TextDetectionResult> textDetections = textDetectionResult.getTextDetections(); for (TextDetectionResult text: textDetections) { long seconds=text.getTimestamp()/1000; System.out.println("Sec: " + Long.toString(seconds) + " "); TextDetection detectedText=text.getTextDetection(); System.out.println("Text Detected: " + detectedText.getDetectedText()); System.out.println("Confidence: " + detectedText.getConfidence().toString()); System.out.println("Id : " + detectedText.getId()); System.out.println("Parent Id: " + detectedText.getParentId()); System.out.println("Bounding Box" + detectedText.getGeometry().getBoundingBox().toString()); System.out.println("Type: " + detectedText.getType()); System.out.println(); } } while (textDetectionResult !=null && textDetectionResult.getNextToken() != null); }

    在函數 main 中,將下行:

    StartLabelDetection(bucket, video); if (GetSQSMessageSuccess()==true) GetLabelDetectionResults();

    取代為:

    StartTextDetection(bucket, video); if (GetSQSMessageSuccess()==true) GetTextDetectionResults();
    Java V2

    此代碼取自 AWS 文檔 SDK 示例 GitHub 存儲庫。請參閱此處的完整範例。

    //snippet-start:[rekognition.java2.recognize_video_text.import] import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider; import software.amazon.awssdk.regions.Region; import software.amazon.awssdk.services.rekognition.RekognitionClient; import software.amazon.awssdk.services.rekognition.model.S3Object; import software.amazon.awssdk.services.rekognition.model.NotificationChannel; import software.amazon.awssdk.services.rekognition.model.Video; import software.amazon.awssdk.services.rekognition.model.StartTextDetectionRequest; import software.amazon.awssdk.services.rekognition.model.StartTextDetectionResponse; import software.amazon.awssdk.services.rekognition.model.RekognitionException; import software.amazon.awssdk.services.rekognition.model.GetTextDetectionResponse; import software.amazon.awssdk.services.rekognition.model.GetTextDetectionRequest; import software.amazon.awssdk.services.rekognition.model.VideoMetadata; import software.amazon.awssdk.services.rekognition.model.TextDetectionResult; import java.util.List; //snippet-end:[rekognition.java2.recognize_video_text.import] /** * Before running this Java V2 code example, set up your development environment, including your credentials. * * For more information, see the following documentation topic: * * https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html */ public class DetectTextVideo { private static String startJobId =""; public static void main(String[] args) { final String usage = "\n" + "Usage: " + " <bucket> <video> <topicArn> <roleArn>\n\n" + "Where:\n" + " bucket - The name of the bucket in which the video is located (for example, (for example, myBucket). \n\n"+ " video - The name of video (for example, people.mp4). \n\n" + " topicArn - The ARN of the Amazon Simple Notification Service (Amazon SNS) topic. \n\n" + " roleArn - The ARN of the AWS Identity and Access Management (IAM) role to use. \n\n" ; if (args.length != 4) { System.out.println(usage); System.exit(1); } String bucket = args[0]; String video = args[1]; String topicArn = args[2]; String roleArn = args[3]; Region region = Region.US_EAST_1; RekognitionClient rekClient = RekognitionClient.builder() .region(region) .credentialsProvider(ProfileCredentialsProvider.create("profile-name")) .build(); NotificationChannel channel = NotificationChannel.builder() .snsTopicArn(topicArn) .roleArn(roleArn) .build(); startTextLabels(rekClient, channel, bucket, video); GetTextResults(rekClient); System.out.println("This example is done!"); rekClient.close(); } // snippet-start:[rekognition.java2.recognize_video_text.main] public static void startTextLabels(RekognitionClient rekClient, NotificationChannel channel, String bucket, String video) { try { S3Object s3Obj = S3Object.builder() .bucket(bucket) .name(video) .build(); Video vidOb = Video.builder() .s3Object(s3Obj) .build(); StartTextDetectionRequest labelDetectionRequest = StartTextDetectionRequest.builder() .jobTag("DetectingLabels") .notificationChannel(channel) .video(vidOb) .build(); StartTextDetectionResponse labelDetectionResponse = rekClient.startTextDetection(labelDetectionRequest); startJobId = labelDetectionResponse.jobId(); } catch (RekognitionException e) { System.out.println(e.getMessage()); System.exit(1); } } public static void GetTextResults(RekognitionClient rekClient) { try { String paginationToken=null; GetTextDetectionResponse textDetectionResponse=null; boolean finished = false; String status; int yy=0 ; do{ if (textDetectionResponse !=null) paginationToken = textDetectionResponse.nextToken(); GetTextDetectionRequest recognitionRequest = GetTextDetectionRequest.builder() .jobId(startJobId) .nextToken(paginationToken) .maxResults(10) .build(); // Wait until the job succeeds. while (!finished) { textDetectionResponse = rekClient.getTextDetection(recognitionRequest); status = textDetectionResponse.jobStatusAsString(); if (status.compareTo("SUCCEEDED") == 0) finished = true; else { System.out.println(yy + " status is: " + status); Thread.sleep(1000); } yy++; } finished = false; // Proceed when the job is done - otherwise VideoMetadata is null. VideoMetadata videoMetaData=textDetectionResponse.videoMetadata(); System.out.println("Format: " + videoMetaData.format()); System.out.println("Codec: " + videoMetaData.codec()); System.out.println("Duration: " + videoMetaData.durationMillis()); System.out.println("FrameRate: " + videoMetaData.frameRate()); System.out.println("Job"); List<TextDetectionResult> labels= textDetectionResponse.textDetections(); for (TextDetectionResult detectedText: labels) { System.out.println("Confidence: " + detectedText.textDetection().confidence().toString()); System.out.println("Id : " + detectedText.textDetection().id()); System.out.println("Parent Id: " + detectedText.textDetection().parentId()); System.out.println("Type: " + detectedText.textDetection().type()); System.out.println("Text: " + detectedText.textDetection().detectedText()); System.out.println(); } } while (textDetectionResponse !=null && textDetectionResponse.nextToken() != null); } catch(RekognitionException | InterruptedException e) { System.out.println(e.getMessage()); System.exit(1); } } // snippet-end:[rekognition.java2.recognize_video_text.main] }
    Python
    #Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. #PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-developer-guide/blob/master/LICENSE-SAMPLECODE.) def StartTextDetection(self): response=self.rek.start_text_detection(Video={'S3Object': {'Bucket': self.bucket, 'Name': self.video}}, NotificationChannel={'RoleArn': self.roleArn, 'SNSTopicArn': self.snsTopicArn}) self.startJobId=response['JobId'] print('Start Job Id: ' + self.startJobId) def GetTextDetectionResults(self): maxResults = 10 paginationToken = '' finished = False while finished == False: response = self.rek.get_text_detection(JobId=self.startJobId, MaxResults=maxResults, NextToken=paginationToken) print('Codec: ' + response['VideoMetadata']['Codec']) print('Duration: ' + str(response['VideoMetadata']['DurationMillis'])) print('Format: ' + response['VideoMetadata']['Format']) print('Frame rate: ' + str(response['VideoMetadata']['FrameRate'])) print() for textDetection in response['TextDetections']: text=textDetection['TextDetection'] print("Timestamp: " + str(textDetection['Timestamp'])) print(" Text Detected: " + text['DetectedText']) print(" Confidence: " + str(text['Confidence'])) print (" Bounding box") print (" Top: " + str(text['Geometry']['BoundingBox']['Top'])) print (" Left: " + str(text['Geometry']['BoundingBox']['Left'])) print (" Width: " + str(text['Geometry']['BoundingBox']['Width'])) print (" Height: " + str(text['Geometry']['BoundingBox']['Height'])) print (" Type: " + str(text['Type']) ) print() if 'NextToken' in response: paginationToken = response['NextToken'] else: finished = True

    在函數 main 中,將下行:

    analyzer.StartLabelDetection() if analyzer.GetSQSMessageSuccess()==True: analyzer.GetLabelDetectionResults()

    取代為:

    analyzer.StartTextDetection() if analyzer.GetSQSMessageSuccess()==True: analyzer.GetTextDetectionResults()
    CLI

    運行以下 AWS CLI 命令以開始檢測視頻中的文本。

    aws rekognition start-text-detection --video "{"S3Object":{"Bucket":"bucket-name","Name":"video-name"}}"\ --notification-channel "{"SNSTopicArn":"topic-arn","RoleArn":"role-arn"}" \ --region region-name --profile profile-name

    更新下列的值:

    • bucket-namevideo-name 變更為您在步驟 2 中指定的 Amazon S3 儲存貯體與文檔名稱。

    • region-name 變更為您正在使用的 AWS 區域。

    • 使用您開發人員設定檔的名稱取代 profile-name 的值。

    • topic-ARN 變更為您在 設定 Amazon Rekognition Video 步驟 3 建立的 Amazon SNS 主題 ARN。

    • role-ARN 變更為您在步驟 7 建立的 設定 Amazon Rekognition Video IAM 服務角色的 ARN。

    如果您在 Windows 裝置上存取 CLI,請使用雙引號而非單引號,並以反斜線 (即\) 替代內部雙引號,以解決您可能遇到的任何剖析器錯誤。如需範例,請參閱下列內容:

    aws rekognition start-text-detection --video \ "{\"S3Object\":{\"Bucket\":\"bucket-name\",\"Name\":\"video-name\"}}" \ --notification-channel "{\"SNSTopicArn\":\"topic-arn\",\"RoleArn\":\"role-arn\"}" \ --region region-name --profile profile-name

    執行正在進行的程式碼範例之後,複製傳回的程式碼 jobID,並將其提供給下列 GetTextDetection 命令,以 job-id-number 取代您之前收到的 jobID 結果:

    aws rekognition get-text-detection --job-id job-id-number --profile profile-name
    注意

    如果您已執行 使用 Java 或 Python (SDK) 分析儲存於 Amazon S3 儲存貯體中的影片 以外的影片範例,要取代的程式碼可能會不同。

  3. 執行程式碼。在影片中偵測到的文字會顯示在清單中。

篩選條件

篩選器是可選的要求參數,當您呼叫 StartTextDetection 時,會使用此篩選器。按文字區域、大小和可信度分數進行篩選可為您提供更大的靈活性來控製文字偵測輸出。透過感興趣的區域,您可以輕鬆地將文字偵測限制在與您相關的區域,例如,圖形的底部第三區域或足球遊戲中讀取記分牌的左上角。文字週框方塊大小篩選器可用於避免產生嘈雜或無關緊要的小背景文字。最後,文字可信度篩選器讓您可以消除因朦朧或模糊而導致的不可靠結果。

如需篩選器值的相關資訊,請參閱 DetectTextFilters

您可以使用下列篩選器:

  • MinConfidence設定文字偵測的信賴等級。可信度低於此等級的文字會從結果中排除。值應該介於 0 和 100 之間。

  • MinBoundingBoxWidth— 設定單字邊界方框的最小寬度。週邊方塊小於此值的文字會從結果中排除。此值相對於影片影格寬度。

  • MinBoundingBoxHeight— 設定單字邊界方框的最小高度。週邊方塊高度小於此值的文字會從結果中排除。此值相對於影片影格高度。

  • RegionsOfInterest— 將偵測限制在框架的特定區域。這些值是對於影格尺寸。對於僅部分區域內的物件,回應是未定義的。

GetTextDetection 回應

GetTextDetection 會傳回陣列 (TextDetectionResults),其中包含影片中偵測到之文字的相關資訊。每當在影片中偵測到文字或文字行時,就會有一個陣列元素 TextDetection。陣列元素會從影片開頭起依時間 (以毫秒為單位) 排序。

以下是 GetTextDetection 的部分 JSON 回應。在回應中,請注意下列事項:

  • 文字資訊TextDetectionResult 陣列元素包含偵測到的文字 (TextDetection) 以及在 video (Timestamp) 中偵測到文字的時間的相關資訊。

  • 分頁資訊:該範例顯示一頁文字偵測資訊。您可以在 GetTextDetectionMaxResults 輸入參數中指定要傳回的文字元素數目。如果結果數目超過 MaxResults,或結果數目超過預設最大值,GetTextDetection 會傳回用來取得下一頁結果的字符 (NextToken)。如需詳細資訊,請參閱 取得 Amazon Rekognition Video 分析結果

  • 影片資訊:回應包含 VideoMetadata 所傳回之每頁資訊中影片格式 (GetTextDetection) 的相關資訊。

{ "JobStatus": "SUCCEEDED", "VideoMetadata": { "Codec": "h264", "DurationMillis": 174441, "Format": "QuickTime / MOV", "FrameRate": 29.970029830932617, "FrameHeight": 480, "FrameWidth": 854 }, "TextDetections": [ { "Timestamp": 967, "TextDetection": { "DetectedText": "Twinkle Twinkle Little Star", "Type": "LINE", "Id": 0, "Confidence": 99.91780090332031, "Geometry": { "BoundingBox": { "Width": 0.8337579369544983, "Height": 0.08365312218666077, "Left": 0.08313830941915512, "Top": 0.4663468301296234 }, "Polygon": [ { "X": 0.08313830941915512, "Y": 0.4663468301296234 }, { "X": 0.9168962240219116, "Y": 0.4674469828605652 }, { "X": 0.916861355304718, "Y": 0.5511001348495483 }, { "X": 0.08310343325138092, "Y": 0.5499999523162842 } ] } } }, { "Timestamp": 967, "TextDetection": { "DetectedText": "Twinkle", "Type": "WORD", "Id": 1, "ParentId": 0, "Confidence": 99.98338317871094, "Geometry": { "BoundingBox": { "Width": 0.2423887550830841, "Height": 0.0833333358168602, "Left": 0.08313817530870438, "Top": 0.46666666865348816 }, "Polygon": [ { "X": 0.08313817530870438, "Y": 0.46666666865348816 }, { "X": 0.3255269229412079, "Y": 0.46666666865348816 }, { "X": 0.3255269229412079, "Y": 0.550000011920929 }, { "X": 0.08313817530870438, "Y": 0.550000011920929 } ] } } }, { "Timestamp": 967, "TextDetection": { "DetectedText": "Twinkle", "Type": "WORD", "Id": 2, "ParentId": 0, "Confidence": 99.982666015625, "Geometry": { "BoundingBox": { "Width": 0.2423887550830841, "Height": 0.08124999701976776, "Left": 0.3454332649707794, "Top": 0.46875 }, "Polygon": [ { "X": 0.3454332649707794, "Y": 0.46875 }, { "X": 0.5878220200538635, "Y": 0.46875 }, { "X": 0.5878220200538635, "Y": 0.550000011920929 }, { "X": 0.3454332649707794, "Y": 0.550000011920929 } ] } } }, { "Timestamp": 967, "TextDetection": { "DetectedText": "Little", "Type": "WORD", "Id": 3, "ParentId": 0, "Confidence": 99.8787612915039, "Geometry": { "BoundingBox": { "Width": 0.16627635061740875, "Height": 0.08124999701976776, "Left": 0.6053864359855652, "Top": 0.46875 }, "Polygon": [ { "X": 0.6053864359855652, "Y": 0.46875 }, { "X": 0.7716627717018127, "Y": 0.46875 }, { "X": 0.7716627717018127, "Y": 0.550000011920929 }, { "X": 0.6053864359855652, "Y": 0.550000011920929 } ] } } }, { "Timestamp": 967, "TextDetection": { "DetectedText": "Star", "Type": "WORD", "Id": 4, "ParentId": 0, "Confidence": 99.82640075683594, "Geometry": { "BoundingBox": { "Width": 0.12997658550739288, "Height": 0.08124999701976776, "Left": 0.7868852615356445, "Top": 0.46875 }, "Polygon": [ { "X": 0.7868852615356445, "Y": 0.46875 }, { "X": 0.9168618321418762, "Y": 0.46875 }, { "X": 0.9168618321418762, "Y": 0.550000011920929 }, { "X": 0.7868852615356445, "Y": 0.550000011920929 } ] } } } ], "NextToken": "NiHpGbZFnkM/S8kLcukMni15wb05iKtquu/Mwc+Qg1LVlMjjKNOD0Z0GusSPg7TONLe+OZ3P", "TextModelVersion": "3.0" }