Common Data Types - AWS Glue

Common Data Types

The Common Data Types describes miscellaneous common data types in AWS Glue.

Tag Structure

The Tag object represents a label that you can assign to an AWS resource. Each tag consists of a key and an optional value, both of which you define.

For more information about tags, and controlling access to resources in AWS Glue, see AWS Tags in AWS Glue and Specifying AWS Glue Resource ARNs in the developer guide.

Fields

  • key – UTF-8 string, not less than 1 or more than 128 bytes long.

    The tag key. The key is required when you create a tag on an object. The key is case-sensitive, and must not contain the prefix aws.

  • value – UTF-8 string, not more than 256 bytes long.

    The tag value. The value is optional when you create a tag on an object. The value is case-sensitive, and must not contain the prefix aws.

DecimalNumber Structure

Contains a numeric value in decimal format.

Fields

  • UnscaledValueRequired: Blob.

    The unscaled numeric value.

  • ScaleRequired: Number (integer).

    The scale that determines where the decimal point falls in the unscaled value.

ErrorDetail Structure

Contains details about an error.

Fields

PropertyPredicate Structure

Defines a property predicate.

Fields

  • Key – Value string, not more than 1024 bytes long.

    The key of the property.

  • Value – Value string, not more than 1024 bytes long.

    The value of the property.

  • Comparator – UTF-8 string (valid values: EQUALS | GREATER_THAN | LESS_THAN | GREATER_THAN_EQUALS | LESS_THAN_EQUALS).

    The comparator used to compare this property to others.

ResourceUri Structure

The URIs for function resources.

Fields

  • ResourceType – UTF-8 string (valid values: JAR | FILE | ARCHIVE).

    The type of the resource.

  • Uri – Uniform resource identifier (uri), not less than 1 or more than 1024 bytes long, matching the URI address multi-line string pattern.

    The URI for accessing the resource.

ColumnStatistics Structure

Represents the generated column-level statistics for a table or partition.

Fields

  • ColumnNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Name of column which statistics belong to.

  • ColumnTypeRequired: Type name, not more than 20000 bytes long, matching the Single-line string pattern.

    The data type of the column.

  • AnalyzedTimeRequired: Timestamp.

    The timestamp of when column statistics were generated.

  • StatisticsDataRequired: A ColumnStatisticsData object.

    A ColumnStatisticData object that contains the statistics data values.

ColumnStatisticsError Structure

Encapsulates a ColumnStatistics object that failed and the reason for failure.

Fields

  • ColumnStatistics – A ColumnStatistics object.

    The ColumnStatistics of the column.

  • Error – An ErrorDetail object.

    An error message with the reason for the failure of an operation.

ColumnError Structure

Encapsulates a column name that failed and the reason for failure.

Fields

  • ColumnName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the column that failed.

  • Error – An ErrorDetail object.

    An error message with the reason for the failure of an operation.

ColumnStatisticsData Structure

Contains the individual types of column statistics data. Only one data object should be set and indicated by the Type attribute.

Fields

BooleanColumnStatisticsData Structure

Defines column statistics supported for Boolean data columns.

Fields

  • NumberOfTruesRequired: Number (long), not more than None.

    The number of true values in the column.

  • NumberOfFalsesRequired: Number (long), not more than None.

    The number of false values in the column.

  • NumberOfNullsRequired: Number (long), not more than None.

    The number of null values in the column.

DateColumnStatisticsData Structure

Defines column statistics supported for timestamp data columns.

Fields

  • MinimumValue – Timestamp.

    The lowest value in the column.

  • MaximumValue – Timestamp.

    The highest value in the column.

  • NumberOfNullsRequired: Number (long), not more than None.

    The number of null values in the column.

  • NumberOfDistinctValuesRequired: Number (long), not more than None.

    The number of distinct values in a column.

DecimalColumnStatisticsData Structure

Defines column statistics supported for fixed-point number data columns.

Fields

  • MinimumValue – A DecimalNumber object.

    The lowest value in the column.

  • MaximumValue – A DecimalNumber object.

    The highest value in the column.

  • NumberOfNullsRequired: Number (long), not more than None.

    The number of null values in the column.

  • NumberOfDistinctValuesRequired: Number (long), not more than None.

    The number of distinct values in a column.

DoubleColumnStatisticsData Structure

Defines column statistics supported for floating-point number data columns.

Fields

  • MinimumValue – Number (double).

    The lowest value in the column.

  • MaximumValue – Number (double).

    The highest value in the column.

  • NumberOfNullsRequired: Number (long), not more than None.

    The number of null values in the column.

  • NumberOfDistinctValuesRequired: Number (long), not more than None.

    The number of distinct values in a column.

LongColumnStatisticsData Structure

Defines column statistics supported for integer data columns.

Fields

  • MinimumValue – Number (long).

    The lowest value in the column.

  • MaximumValue – Number (long).

    The highest value in the column.

  • NumberOfNullsRequired: Number (long), not more than None.

    The number of null values in the column.

  • NumberOfDistinctValuesRequired: Number (long), not more than None.

    The number of distinct values in a column.

StringColumnStatisticsData Structure

Defines column statistics supported for character sequence data values.

Fields

  • MaximumLengthRequired: Number (long), not more than None.

    The size of the longest string in the column.

  • AverageLengthRequired: Number (double), not more than None.

    The average string length in the column.

  • NumberOfNullsRequired: Number (long), not more than None.

    The number of null values in the column.

  • NumberOfDistinctValuesRequired: Number (long), not more than None.

    The number of distinct values in a column.

BinaryColumnStatisticsData Structure

Defines column statistics supported for bit sequence data values.

Fields

  • MaximumLengthRequired: Number (long), not more than None.

    The size of the longest bit sequence in the column.

  • AverageLengthRequired: Number (double), not more than None.

    The average bit sequence length in the column.

  • NumberOfNullsRequired: Number (long), not more than None.

    The number of null values in the column.

String Patterns

The API uses the following regular expressions to define what is valid content for various string parameters and members:

  • Single-line string pattern – "[\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\t]*"

  • URI address multi-line string pattern – "[\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\r\n\t]*"

  • A Logstash Grok string pattern – "[\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\r\t]*"

  • Identifier string pattern – "[A-Za-z_][A-Za-z0-9_]*"

  • AWS Glue ARN string pattern – "arn:aws:glue:.*"

  • AWS IAM ARN string pattern – "arn:aws:iam::\d{12}:role/.*"

  • Version string pattern – "^[a-zA-Z0-9-_]+$"

  • Log group string pattern – "[\.\-_/#A-Za-z0-9]+"

  • Log-stream string pattern – "[^:*]*"

  • Custom string pattern #10 – "[^\r\n]"

  • Custom string pattern #11 – "^[2-3]$"

  • Custom string pattern #12 – "^\w+\.\w+\.\w+$"

  • Custom string pattern #13 – "^\w+\.\w+$"

  • Custom string pattern #14 – "arn:aws:kms:.*"