Table Of Contents

Feedback

User Guide

First time using the AWS CLI? See the User Guide for help getting started.

[ aws . rekognition ]

detect-text

Description

Detects text in the input image and converts it into machine-readable text.

Pass the input image as base64-encoded image bytes or as a reference to an image in an Amazon S3 bucket. If you use the AWS CLI to call Amazon Rekognition operations, you must pass it as a reference to an image in an Amazon S3 bucket. For the AWS CLI, passing image bytes is not supported. The image must be either a .png or .jpeg formatted file.

The detect-text operation returns text in an array of elements, TextDetections . Each TextDetection element provides information about a single word or line of text that was detected in the image.

A word is one or more ISO basic latin script characters that are not separated by spaces. detect-text can detect up to 50 words in an image.

A line is a string of equally spaced words. A line isn't necessarily a complete sentence. For example, a driver's license number is detected as a line. A line ends when there is no aligned text after it. Also, a line ends when there is a large gap between words, relative to the length of the words. This means, depending on the gap between words, Amazon Rekognition may detect multiple lines in text aligned in the same direction. Periods don't represent the end of a line. If a sentence spans multiple lines, the detect-text operation returns multiple lines.

To determine whether a TextDetection element is a line of text or a word, use the TextDetection object Type field.

To be detected, text must be within +/- 30 degrees orientation of the horizontal axis.

For more information, see detect-text in the Amazon Rekognition Developer Guide.

See also: AWS API Documentation

See 'aws help' for descriptions of global parameters.

Synopsis

  detect-text
[--image <value>]
[--image-bytes <value>]
[--cli-input-json <value>]
[--generate-cli-skeleton <value>]

Options

--image (structure)

The input image as base64-encoded bytes or an Amazon S3 object. If you use the AWS CLI to call Amazon Rekognition operations, you can't pass image bytes.

To specify a local file use --image-bytes instead.

Shorthand Syntax:

Bytes=blob,S3Object={Bucket=string,Name=string,Version=string}

JSON Syntax:

{
  "Bytes": blob,
  "S3Object": {
    "Bucket": "string",
    "Name": "string",
    "Version": "string"
  }
}

--image-bytes (blob)

The content of the image to be uploaded. To specify the content of a local file use the fileb:// prefix. Example: fileb://image.png

--cli-input-json (string) Performs service operation based on the JSON string provided. The JSON string follows the format provided by --generate-cli-skeleton. If other arguments are provided on the command line, the CLI values will override the JSON-provided values.

--generate-cli-skeleton (string) Prints a JSON skeleton to standard output without sending an API request. If provided with no value or the value input, prints a sample input JSON that can be used as an argument for --cli-input-json. If provided with the value output, it validates the command inputs and returns a sample output JSON for that command.

See 'aws help' for descriptions of global parameters.

Output

TextDetections -> (list)

An array of text that was detected in the input image.

(structure)

Information about a word or line of text detected by .

The DetectedText field contains the text that Amazon Rekognition detected in the image.

Every word and line has an identifier (Id ). Each word belongs to a line and has a parent identifier (ParentId ) that identifies the line of text in which the word appears. The word Id is also an index for the word within a line of words.

For more information, see Detecting Text in the Amazon Rekognition Developer Guide.

DetectedText -> (string)

The word or line of text recognized by Amazon Rekognition.

Type -> (string)

The type of text that was detected.

Id -> (integer)

The identifier for the detected text. The identifier is only unique for a single call to detect-text .

ParentId -> (integer)

The Parent identifier for the detected text identified by the value of ID . If the type of detected text is LINE , the value of ParentId is Null .

Confidence -> (float)

The confidence that Amazon Rekognition has in the accuracy of the detected text and the accuracy of the geometry points around the detected text.

Geometry -> (structure)

The location of the detected text on the image. Includes an axis aligned coarse bounding box surrounding the text and a finer grain polygon for more accurate spatial information.

BoundingBox -> (structure)

An axis-aligned coarse representation of the detected text's location on the image.

Width -> (float)

Width of the bounding box as a ratio of the overall image width.

Height -> (float)

Height of the bounding box as a ratio of the overall image height.

Left -> (float)

Left coordinate of the bounding box as a ratio of overall image width.

Top -> (float)

Top coordinate of the bounding box as a ratio of overall image height.

Polygon -> (list)

Within the bounding box, a fine-grained polygon around the detected text.

(structure)

The X and Y coordinates of a point on an image. The X and Y values returned are ratios of the overall image size. For example, if the input image is 700x200 and the operation returns X=0.5 and Y=0.25, then the point is at the (350,50) pixel coordinate on the image.

An array of Point objects, Polygon , is returned by . Polygon represents a fine-grained polygon around detected text. For more information, see Geometry in the Amazon Rekognition Developer Guide.

X -> (float)

The value of the X coordinate for a point on a Polygon .

Y -> (float)

The value of the Y coordinate for a point on a Polygon .