AWS Glue
Developer Guide

Partition API

Data Types

Partition Structure

Represents a slice of table data.

Fields

  • Values – An array of UTF-8 strings, at least 1 string.

    The values of the partition.

  • DatabaseName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the catalog database where the table in question is located.

  • TableName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table in question.

  • CreationTime – Timestamp.

    The time at which the partition was created.

  • LastAccessTime – Timestamp.

    The last time at which the partition was accessed.

  • StorageDescriptor – A StorageDescriptor object.

    Provides information about the physical location where the partition is stored.

  • Parameters – A map array of key-value pairs.

    Each key is a Key string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Each value is a UTF-8 string, not more than 512000 bytes long.

    These key-value pairs define partition parameters.

  • LastAnalyzedTime – Timestamp.

    The last time at which column statistics were computed for this partition.

PartitionInput Structure

The structure used to create and update a partion.

Fields

  • Values – An array of UTF-8 strings, at least 1 string.

    The values of the partition.

  • LastAccessTime – Timestamp.

    The last time at which the partition was accessed.

  • StorageDescriptor – A StorageDescriptor object.

    Provides information about the physical location where the partition is stored.

  • Parameters – A map array of key-value pairs.

    Each key is a Key string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Each value is a UTF-8 string, not more than 512000 bytes long.

    These key-value pairs define partition parameters.

  • LastAnalyzedTime – Timestamp.

    The last time at which column statistics were computed for this partition.

PartitionSpecWithSharedStorageDescriptor Structure

A partition specification for partitions that share a physical location.

Fields

  • StorageDescriptor – A StorageDescriptor object.

    The shared physical storage information.

  • Partitions – An array of Partition objects.

    A list of the partitions that share this physical location.

PartitionListComposingSpec Structure

Lists related partitions.

Fields

  • Partitions – An array of Partition objects.

    A list of the partitions in the composing specification.

PartitionSpecProxy Structure

Provides a root path to specified partitions.

Fields

  • DatabaseName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The catalog database in which the partions reside.

  • TableName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table containing the partitions.

  • RootPath – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The root path of the proxy for addressing the partitions.

  • PartitionSpecWithSharedSD – A PartitionSpecWithSharedStorageDescriptor object.

    A specification of partitions that share the same physical storage location.

  • PartitionListComposingSpec – A PartitionListComposingSpec object.

    Specifies a list of partitions.

PartitionValueList Structure

Contains a list of values defining partitions.

Fields

  • ValuesRequired: An array of UTF-8 strings, at least 1 string.

    The list of values.

Segment Structure

Defines a non-overlapping region of a table's partitions, allowing multiple requests to be executed in parallel.

Fields

  • SegmentNumberRequired: Number (integer), not more than None.

    The zero-based index number of the this segment. For example, if the total number of segments is 4, SegmentNumber values will range from zero through three.

  • TotalSegmentsRequired: Number (integer), not less than 1 or more than 10.

    The total numer of segments.

PartitionError Structure

Contains information about a partition error.

Fields

  • PartitionValues – An array of UTF-8 strings, at least 1 string.

    The values that define the partition.

  • ErrorDetail – An ErrorDetail object.

    Details about the partition error.

Operations

CreatePartition Action (Python: create_partition)

Creates a new partition.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the catalog in which the partion is to be created. Currently, this should be the AWS account ID.

  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the metadata database in which the partition is to be created.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the metadata table in which the partition is to be created.

  • PartitionInputRequired: A PartitionInput object.

    A PartitionInput structure defining the partition to be created.

Response

  • No Response parameters.

Errors

  • InvalidInputException

  • AlreadyExistsException

  • ResourceNumberLimitExceededException

  • InternalServiceException

  • EntityNotFoundException

  • OperationTimeoutException

  • GlueEncryptionException

BatchCreatePartition Action (Python: batch_create_partition)

Creates one or more partitions in a batch operation.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the catalog in which the partion is to be created. Currently, this should be the AWS account ID.

  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the metadata database in which the partition is to be created.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the metadata table in which the partition is to be created.

  • PartitionInputListRequired: An array of PartitionInput objects, not more than 100 structures.

    A list of PartitionInput structures that define the partitions to be created.

Response

  • Errors – An array of PartitionError objects.

    Errors encountered when trying to create the requested partitions.

Errors

  • InvalidInputException

  • AlreadyExistsException

  • ResourceNumberLimitExceededException

  • InternalServiceException

  • EntityNotFoundException

  • OperationTimeoutException

  • GlueEncryptionException

UpdatePartition Action (Python: update_partition)

Updates a partition.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the partition to be updated resides. If none is supplied, the AWS account ID is used by default.

  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the catalog database in which the table in question resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table where the partition to be updated is located.

  • PartitionValueListRequired: An array of UTF-8 strings, not more than 100 strings.

    A list of the values defining the partition.

  • PartitionInputRequired: A PartitionInput object.

    The new partition object to which to update the partition.

Response

  • No Response parameters.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException

  • GlueEncryptionException

DeletePartition Action (Python: delete_partition)

Deletes a specified partition.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the partition to be deleted resides. If none is supplied, the AWS account ID is used by default.

  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the catalog database in which the table in question resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table where the partition to be deleted is located.

  • PartitionValuesRequired: An array of UTF-8 strings, at least 1 string.

    The values that define the partition.

Response

  • No Response parameters.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException

BatchDeletePartition Action (Python: batch_delete_partition)

Deletes one or more partitions in a batch operation.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the partition to be deleted resides. If none is supplied, the AWS account ID is used by default.

  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the catalog database in which the table in question resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table where the partitions to be deleted is located.

  • PartitionsToDeleteRequired: An array of PartitionValueList objects, not more than 25 structures.

    A list of PartitionInput structures that define the partitions to be deleted.

Response

  • Errors – An array of PartitionError objects.

    Errors encountered when trying to delete the requested partitions.

Errors

  • InvalidInputException

  • EntityNotFoundException

  • InternalServiceException

  • OperationTimeoutException

GetPartition Action (Python: get_partition)

Retrieves information about a specified partition.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the partition in question resides. If none is supplied, the AWS account ID is used by default.

  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the catalog database where the partition resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the partition's table.

  • PartitionValuesRequired: An array of UTF-8 strings, at least 1 string.

    The values that define the partition.

Response

  • Partition – A Partition object.

    The requested information, in the form of a Partition object.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException

  • GlueEncryptionException

GetPartitions Action (Python: get_partitions)

Retrieves information about the partitions in a table.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the partitions in question reside. If none is supplied, the AWS account ID is used by default.

  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the catalog database where the partitions reside.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the partitions' table.

  • Expression – Predicate string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    An expression filtering the partitions to be returned.

    The expression uses SQL syntax similar to the SQL WHERE filter clause. The SQL statement parser JSQLParser parses the expression.

    Operators: The following are the operators that you can use in the Expression API call:

    =

    Checks if the values of the two operands are equal or not; if yes, then the condition becomes true.

    Example: Assume 'variable a' holds 10 and 'variable b' holds 20.

    (a = b) is not true.

    < >

    Checks if the values of two operands are equal or not; if the values are not equal, then the condition becomes true.

    Example: (a < > b) is true.

    >

    Checks if the value of the left operand is greater than the value of the right operand; if yes, then the condition becomes true.

    Example: (a > b) is not true.

    <

    Checks if the value of the left operand is less than the value of the right operand; if yes, then the condition becomes true.

    Example: (a < b) is true.

    >=

    Checks if the value of the left operand is greater than or equal to the value of the right operand; if yes, then the condition becomes true.

    Example: (a >= b) is not true.

    <=

    Checks if the value of the left operand is less than or equal to the value of the right operand; if yes, then the condition becomes true.

    Example: (a <= b) is true.

    AND, OR, IN, BETWEEN, LIKE, NOT, IS NULL

    Logical operators.

    Supported Partition Key Types: The following are the the supported partition keys.

    • string

    • date

    • timestamp

    • int

    • bigint

    • long

    • tinyint

    • smallint

    • decimal

    If an invalid type is encountered, an exception is thrown.

    The following list shows the valid operators on each type. When you define a crawler, the partitionKey type is created as a STRING, to be compatible with the catalog partitions.

    Sample API Call:

    The table twitter_partition has three partitions:

    year = 2015 year = 2016 year = 2017

    Get Partition year equals to 2015

    aws glue get-partitions --database-name dbname --table-name twitter_partition --expression "year*=*'2015'"

    Get Partition year between 2016-2018 (exclusive)

    aws glue get-partitions --database-name dbname --table-name twitter_partition --expression "year>'2016' AND year<'2018'"

    Get Partition year year between 2015-2018 (inclusive). The following API calls are equivalent to each other

    aws glue get-partitions --database-name dbname --table-name twitter_partition --expression "year>='2015' AND year<='2018'" aws glue get-partitions --database-name dbname --table-name twitter_partition --expression "year BETWEEN 2015 AND 2018" aws glue get-partitions --database-name dbname --table-name twitter_partition --expression "year IN (2015,2016,2017,2018)"

    A wildcard partition filter, where the following call output will be partition year=2017. A regular expression is not supported in LIKE.

    aws glue get-partitions --database-name dbname --table-name twitter_partition --expression "year LIKE '%7'"
  • NextToken – UTF-8 string.

    A continuation token, if this is not the first call to retrieve these partitions.

  • Segment – A Segment object.

    The segment of the table's partitions to scan in this request.

  • MaxResults – Number (integer), not less than 1 or more than 1000.

    The maximum number of partitions to return in a single response.

Response

  • Partitions – An array of Partition objects.

    A list of requested partitions.

  • NextToken – UTF-8 string.

    A continuation token, if the returned list of partitions does not does not include the last one.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • OperationTimeoutException

  • InternalServiceException

  • GlueEncryptionException

BatchGetPartition Action (Python: batch_get_partition)

Retrieves partitions in a batch request.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the partitions in question reside. If none is supplied, the AWS account ID is used by default.

  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the catalog database where the partitions reside.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the partitions' table.

  • PartitionsToGetRequired: An array of PartitionValueList objects, not more than 1000 structures.

    A list of partition values identifying the partitions to retrieve.

Response

  • Partitions – An array of Partition objects.

    A list of the requested partitions.

  • UnprocessedKeys – An array of PartitionValueList objects, not more than 1000 structures.

    A list of the partition values in the request for which partions were not returned.

Errors

  • InvalidInputException

  • EntityNotFoundException

  • OperationTimeoutException

  • InternalServiceException

  • GlueEncryptionException