Menu
AWS Glue
Developer Guide

Table API

Data Types

Table Structure

Represents a collection of related data organized in columns and rows.

Fields

  • Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    Name of the table. For Hive compatibility, this must be entirely lowercase.

  • DatabaseName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Name of the metadata database where the table metadata resides. For Hive compatibility, this must be all lowercase.

  • Description – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    Description of the table.

  • Owner – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Owner of the table.

  • CreateTime – Timestamp.

    Time when the table definition was created in the Data Catalog.

  • UpdateTime – Timestamp.

    Last time the table was updated.

  • LastAccessTime – Timestamp.

    Last time the table was accessed. This is usually taken from HDFS, and may not be reliable.

  • LastAnalyzedTime – Timestamp.

    Last time column statistics were computed for this table.

  • Retention – Number (integer), at least 0.

    Retention time for this table.

  • StorageDescriptor – A StorageDescriptor object.

    A storage descriptor containing information about the physical storage of this table.

  • PartitionKeys – An array of Columns.

    A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.

  • ViewOriginalText – UTF-8 string, not more than 409600 bytes long.

    If the table is a view, the original text of the view; otherwise null.

  • ViewExpandedText – UTF-8 string, not more than 409600 bytes long.

    If the table is a view, the expanded text of the view; otherwise null.

  • TableType – UTF-8 string, not more than 255 bytes long.

    The type of this table (EXTERNAL_TABLE, VIRTUAL_VIEW, etc.).

  • Parameters – A map array of key-value pairs

    Each key is a Key string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Each value is a UTF-8 string, not more than 512000 bytes long.

    These key-value pairs define properties associated with the table.

  • CreatedBy – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Person or entity who created the table.

TableInput Structure

Structure used to create or update the table.

Fields

  • Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    Name of the table. For Hive compatibility, this is folded to lowercase when it is stored.

  • Description – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    Description of the table.

  • Owner – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Owner of the table.

  • LastAccessTime – Timestamp.

    Last time the table was accessed.

  • LastAnalyzedTime – Timestamp.

    Last time column statistics were computed for this table.

  • Retention – Number (integer), at least 0.

    Retention time for this table.

  • StorageDescriptor – A StorageDescriptor object.

    A storage descriptor containing information about the physical storage of this table.

  • PartitionKeys – An array of Columns.

    A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.

  • ViewOriginalText – UTF-8 string, not more than 409600 bytes long.

    If the table is a view, the original text of the view; otherwise null.

  • ViewExpandedText – UTF-8 string, not more than 409600 bytes long.

    If the table is a view, the expanded text of the view; otherwise null.

  • TableType – UTF-8 string, not more than 255 bytes long.

    The type of this table (EXTERNAL_TABLE, VIRTUAL_VIEW, etc.).

  • Parameters – A map array of key-value pairs

    Each key is a Key string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Each value is a UTF-8 string, not more than 512000 bytes long.

    These key-value pairs define properties associated with the table.

Column Structure

A column in a Table.

Fields

  • Name – UTF-8 string, not less than 1 or more than 1024 bytes long, matching the Single-line string pattern. Required.

    The name of the Column.

  • Type – UTF-8 string, not more than 131072 bytes long, matching the Single-line string pattern.

    The datatype of data in the Column.

  • Comment – Comment string, not more than 255 bytes long, matching the Single-line string pattern.

    Free-form text comment.

StorageDescriptor Structure

Describes the physical storage of table data.

Fields

  • Columns – An array of Columns.

    A list of the Columns in the table.

  • Location – Location string, not more than 2056 bytes long, matching the URI address multi-line string pattern.

    The physical location of the table. By default this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.

  • InputFormat – Format string, not more than 128 bytes long, matching the Single-line string pattern.

    The input format: SequenceFileInputFormat (binary), or TextInputFormat, or a custom format.

  • OutputFormat – Format string, not more than 128 bytes long, matching the Single-line string pattern.

    The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat, or a custom format.

  • Compressed – Boolean.

    True if the data in the table is compressed, or False if not.

  • NumberOfBuckets – Number (integer).

    Must be specified if the table contains any dimension columns.

  • SerdeInfo – A SerDeInfo object.

    Serialization/deserialization (SerDe) information.

  • BucketColumns – An array of UTF-8 strings.

    A list of reducer grouping columns, clustering columns, and bucketing columns in the table.

  • SortColumns – An array of Orders.

    A list specifying the sort order of each bucket in the table.

  • Parameters – A map array of key-value pairs

    Each key is a Key string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Each value is a UTF-8 string, not more than 512000 bytes long.

    User-supplied properties in key-value form.

  • SkewedInfo – A SkewedInfo object.

    Information about values that appear very frequently in a column (skewed values).

  • StoredAsSubDirectories – Boolean.

    True if the table data is stored in subdirectories, or False if not.

SerDeInfo Structure

Information about a serialization/deserialization program (SerDe) which serves as an extractor and loader.

Fields

  • Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Name of the SerDe.

  • SerializationLibrary – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Usually the class that implements the SerDe. An example is: org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe.

  • Parameters – A map array of key-value pairs

    Each key is a Key string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Each value is a UTF-8 string, not more than 512000 bytes long.

    These key-value pairs define initialization parameters for the SerDe.

Order Structure

Specifies the sort order of a sorted column.

Fields

  • Column – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The name of the column.

  • SortOrder – Number (integer), not less than 0 or more than 1. Required.

    Indicates that the column is sorted in ascending order (== 1), or in descending order (==0).

SkewedInfo Structure

Specifies skewed values in a table. Skewed are ones that occur with very high frequency.

Fields

  • SkewedColumnNames – An array of UTF-8 strings.

    A list of names of columns that contain skewed values.

  • SkewedColumnValues – An array of UTF-8 strings.

    A list of values that appear so frequently as to be considered skewed.

  • SkewedColumnValueLocationMaps – A map array of key-value pairs

    Each key is a UTF-8 string.

    Each value is a UTF-8 string.

    A mapping of skewed values to the columns that contain them.

TableVersion Structure

Specifies a version of a table.

Fields

  • Table – A Table object.

    The table in question

  • VersionId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID value that identifies this table version.

TableError Structure

An error record for table operations.

Fields

  • TableName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Name of the table. For Hive compatibility, this must be entirely lowercase.

  • ErrorDetail – An ErrorDetail object.

    Detail about the error.

TableVersionError Structure

An error record for table-version operations.

Fields

  • TableName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table in question.

  • VersionId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID value of the version in question.

  • ErrorDetail – An ErrorDetail object.

    Detail about the error.

Operations

CreateTable Action (Python: create_table)

Creates a new table definition in the Data Catalog.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog in which to create the Table. If none is supplied, the AWS account ID is used by default.

  • DatabaseName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The catalog database in which to create the new table. For Hive compatibility, this name is entirely lowercase.

  • TableInput – A TableInput object. Required.

    The TableInput object that defines the metadata table to create in the catalog.

Response

  • No Response parameters.

Errors

  • AlreadyExistsException

  • InvalidInputException

  • EntityNotFoundException

  • ResourceNumberLimitExceededException

  • InternalServiceException

  • OperationTimeoutException

  • GlueEncryptionException

UpdateTable Action (Python: update_table)

Updates a metadata table in the Data Catalog.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the table resides. If none is supplied, the AWS account ID is used by default.

  • DatabaseName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The name of the catalog database in which the table resides. For Hive compatibility, this name is entirely lowercase.

  • TableInput – A TableInput object. Required.

    An updated TableInput object to define the metadata table in the catalog.

  • SkipArchive – Boolean.

    By default, UpdateTable always creates an archived version of the table before updating it. If skipArchive is set to true, however, UpdateTable does not create the archived version.

Response

  • No Response parameters.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException

  • ConcurrentModificationException

  • ResourceNumberLimitExceededException

  • GlueEncryptionException

DeleteTable Action (Python: delete_table)

Removes a table definition from the Data Catalog.

Note

After completing this operation, you will no longer have access to the table versions and partitions that belong to the deleted table. AWS Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service.

To ensure immediate deletion of all related resources, before calling DeleteTable, use DeleteTableVersion or BatchDeleteTableVersion, and DeletePartition or BatchDeletePartition, to delete any resources that belong to the table.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the table resides. If none is supplied, the AWS account ID is used by default.

  • DatabaseName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The name of the catalog database in which the table resides. For Hive compatibility, this name is entirely lowercase.

  • Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The name of the table to be deleted. For Hive compatibility, this name is entirely lowercase.

Response

  • No Response parameters.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException

BatchDeleteTable Action (Python: batch_delete_table)

Deletes multiple tables at once.

Note

After completing this operation, you will no longer have access to the table versions and partitions that belong to the deleted table. AWS Glue deletes these "orphaned" resources asynchronously in a timely manner, at the discretion of the service.

To ensure immediate deletion of all related resources, before calling BatchDeleteTable, use DeleteTableVersion or BatchDeleteTableVersion, and DeletePartition or BatchDeletePartition, to delete any resources that belong to the table.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the table resides. If none is supplied, the AWS account ID is used by default.

  • DatabaseName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The name of the catalog database where the tables to delete reside. For Hive compatibility, this name is entirely lowercase.

  • TablesToDelete – An array of UTF-8 strings, not more than 100 items in the array. Required.

    A list of the table to delete.

Response

  • Errors – An array of TableErrors.

    A list of errors encountered in attempting to delete the specified tables.

Errors

  • InvalidInputException

  • EntityNotFoundException

  • InternalServiceException

  • OperationTimeoutException

GetTable Action (Python: get_table)

Retrieves the Table definition in a Data Catalog for a specified table.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the table resides. If none is supplied, the AWS account ID is used by default.

  • DatabaseName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The name of the database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.

  • Name – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The name of the table for which to retrieve the definition. For Hive compatibility, this name is entirely lowercase.

Response

  • Table – A Table object.

    The Table object that defines the specified table.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException

  • GlueEncryptionException

GetTables Action (Python: get_tables)

Retrieves the definitions of some or all of the tables in a given Database.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the tables reside. If none is supplied, the AWS account ID is used by default.

  • DatabaseName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The database in the catalog whose tables to list. For Hive compatibility, this name is entirely lowercase.

  • Expression – UTF-8 string, not more than 2048 bytes long, matching the Single-line string pattern.

    A regular expression pattern. If present, only those tables whose names match the pattern are returned.

  • NextToken – UTF-8 string.

    A continuation token, included if this is a continuation call.

  • MaxResults – Number (integer), not less than 1 or more than 1000.

    The maximum number of tables to return in a single response.

Response

  • TableList – An array of Tables.

    A list of the requested Table objects.

  • NextToken – UTF-8 string.

    A continuation token, present if the current list segment is not the last.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • OperationTimeoutException

  • InternalServiceException

  • GlueEncryptionException

GetTableVersion Action (Python: get_table_version)

Retrieves a specified version of a table.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the tables reside. If none is supplied, the AWS account ID is used by default.

  • DatabaseName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.

  • TableName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The name of the table. For Hive compatibility, this name is entirely lowercase.

  • VersionId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID value of the table version to be retrieved.

Response

  • TableVersion – A TableVersion object.

    The requested table version.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException

  • GlueEncryptionException

GetTableVersions Action (Python: get_table_versions)

Retrieves a list of strings that identify available versions of a specified table.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the tables reside. If none is supplied, the AWS account ID is used by default.

  • DatabaseName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.

  • TableName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The name of the table. For Hive compatibility, this name is entirely lowercase.

  • NextToken – UTF-8 string.

    A continuation token, if this is not the first call.

  • MaxResults – Number (integer), not less than 1 or more than 1000.

    The maximum number of table versions to return in one response.

Response

  • TableVersions – An array of TableVersions.

    A list of strings identifying available versions of the specified table.

  • NextToken – UTF-8 string.

    A continuation token, if the list of available versions does not include the last one.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException

  • GlueEncryptionException

DeleteTableVersion Action (Python: delete_table_version)

Deletes a specified version of a table.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the tables reside. If none is supplied, the AWS account ID is used by default.

  • DatabaseName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.

  • TableName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The name of the table. For Hive compatibility, this name is entirely lowercase.

  • VersionId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The ID of the table version to be deleted.

Response

  • No Response parameters.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException

BatchDeleteTableVersion Action (Python: batch_delete_table_version)

Deletes a specified batch of versions of a table.

Request

  • CatalogId – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the tables reside. If none is supplied, the AWS account ID is used by default.

  • DatabaseName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.

  • TableName – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern. Required.

    The name of the table. For Hive compatibility, this name is entirely lowercase.

  • VersionIds – An array of UTF-8 strings, not more than 100 items in the array. Required.

    A list of the IDs of versions to be deleted.

Response

  • Errors – An array of TableVersionErrors.

    A list of errors encountered while trying to delete the specified table versions.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException