Menu
AWS Glue
Developer Guide

Table API

Data Types

Table Structure

Represents a collection of related data organized in columns and rows.

Fields

  • Name – String, matching the Single-line string pattern. Required.

    Name of the table.

  • DatabaseName – String, matching the Single-line string pattern.

    Name of the metadata database where the table metadata resides.

  • Description – Description string, matching the URI address multi-line string pattern.

    Description of the table.

  • Owner – String, matching the Single-line string pattern.

    Owner of the table.

  • CreateTime – Timestamp.

    Time when the table definition was created in the Data Catalog.

  • UpdateTime – Timestamp.

    Last time the table was updated.

  • LastAccessTime – Timestamp.

    Last time the table was accessed. This is usually taken from HDFS, and may not be reliable.

  • LastAnalyzedTime – Timestamp.

    Last time column statistics were computed for this table.

  • Retention – Number (integer).

    Retention time for this table.

  • StorageDescriptor – A StorageDescriptor object.

    A storage descriptor containing information about the physical storage of this table.

  • PartitionKeys – An array of Columns.

    A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.

  • ViewOriginalText – String.

    If the table is a view, the original text of the view; otherwise null.

  • ViewExpandedText – String.

    If the table is a view, the expanded text of the view; otherwise null.

  • TableType – String.

    The type of this table (EXTERNAL_TABLE, VIRTUAL_VIEW, etc.).

  • Parameters – An array of UTF-8 string–to–UTF-8 string mappings.

    Properties associated with this table, as a list of key-value pairs.

  • CreatedBy – String, matching the Single-line string pattern.

    Person or entity who created the table.

TableInput Structure

Structure used to create or update the table.

Fields

  • Name – String, matching the Single-line string pattern. Required.

    Name of the table.

  • Description – Description string, matching the URI address multi-line string pattern.

    Description of the table.

  • Owner – String, matching the Single-line string pattern.

    Owner of the table.

  • LastAccessTime – Timestamp.

    Last time the table was accessed.

  • LastAnalyzedTime – Timestamp.

    Last time column statistics were computed for this table.

  • Retention – Number (integer).

    Retention time for this table.

  • StorageDescriptor – A StorageDescriptor object.

    A storage descriptor containing information about the physical storage of this table.

  • PartitionKeys – An array of Columns.

    A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.

  • ViewOriginalText – String.

    If the table is a view, the original text of the view; otherwise null.

  • ViewExpandedText – String.

    If the table is a view, the expanded text of the view; otherwise null.

  • TableType – String.

    The type of this table (EXTERNAL_TABLE, VIRTUAL_VIEW, etc.).

  • Parameters – An array of UTF-8 string–to–UTF-8 string mappings.

    Properties associated with this table, as a list of key-value pairs.

Column Structure

A column in a Table.

Fields

StorageDescriptor Structure

Describes the physical storage of table data.

Fields

  • Columns – An array of Columns.

    A list of the Columns in the table.

  • Location – Location string, matching the URI address multi-line string pattern.

    The physical location of the table. By default this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.

  • InputFormat – Format string, matching the Single-line string pattern.

    The input format: SequenceFileInputFormat (binary), or TextInputFormat, or a custom format.

  • OutputFormat – Format string, matching the Single-line string pattern.

    The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat, or a custom format.

  • Compressed – Boolean.

    True if the data in the table is compressed, or False if not.

  • NumberOfBuckets – Number (integer).

    Must be specified if the table contains any dimension columns.

  • SerdeInfo – A SerDeInfo object.

    Serialization/deserialization (SerDe) information.

  • BucketColumns – An array of UTF-8 strings.

    A list of reducer grouping columns, clustering columns, and bucketing columns in the table.

  • SortColumns – An array of Orders.

    A list specifying the sort order of each bucket in the table.

  • Parameters – An array of UTF-8 string–to–UTF-8 string mappings.

    User-supplied properties in key-value form.

  • SkewedInfo – A SkewedInfo object.

    Information about values that appear very frequently in a column (skewed values).

  • StoredAsSubDirectories – Boolean.

    True if the table data is stored in subdirectories, or False if not.

SerDeInfo Structure

Information about a serialization/deserialization program (SerDe) which serves as an extractor and loader.

Fields

  • Name – String, matching the Single-line string pattern.

    Name of the SerDe.

  • SerializationLibrary – String, matching the Single-line string pattern.

    Usually the class that implements the SerDe. An example is: org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe.

  • Parameters – An array of UTF-8 string–to–UTF-8 string mappings.

    A list of initialization parameters for the SerDe, in key-value form.

Order Structure

Specifies the sort order of a sorted column.

Fields

  • Column – String, matching the Single-line string pattern. Required.

    The name of the column.

  • SortOrder – Number (integer). Required.

    Indicates that the column is sorted in ascending order (== 1), or in descending order (==0).

SkewedInfo Structure

Specifies skewed values in a table. Skewed are ones that occur with very high frequency.

Fields

  • SkewedColumnNames – An array of UTF-8 strings.

    A list of names of columns that contain skewed values.

  • SkewedColumnValues – An array of UTF-8 strings.

    A list of values that appear so frequently as to be considered skewed.

  • SkewedColumnValueLocationMaps – An array of UTF-8 string–to–UTF-8 string mappings.

    A mapping of skewed values to the columns that contain them.

TableVersion Structure

Specifies a version of a table.

Fields

  • Table – A Table object.

    The table in question

  • VersionId – String, matching the Single-line string pattern.

    The ID value that identifies this table version.

TableError Structure

An error record for table operations.

Fields

  • TableName – String, matching the Single-line string pattern.

    Name of the table.

  • ErrorDetail – An ErrorDetail object.

    Detail about the error.

Operations

CreateTable Action (Python: create_table)

Creates a new table definition in the Data Catalog.

Request

  • CatalogId – Catalog id string, matching the Single-line string pattern.

    The ID of the Data Catalog in which to create the Table. If none is supplied, the AWS account ID is used by default.

  • DatabaseName – String, matching the Single-line string pattern. Required.

    The catalog database in which to create the new table.

  • TableInput – A TableInput object. Required.

    The TableInput object that defines the metadata table to create in the catalog.

Related Hive DDL:

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [database_name.]table_name
  -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [COMMENT col_comment], ...)]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...)
  [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)
  -- (Note: Available in Hive 0.10.0 and later)]
  ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
  [STORED AS DIRECTORIES]
  [
  [ROW FORMAT row_format]
  [STORED AS file_format]
  | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]
  -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];
  -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)

  CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];

  row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]]
  [COLLECTION ITEMS TERMINATED BY char]
  [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
  [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name
  [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

  file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

Response

  • No Response parameters.

Related Hive DDL:

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [database_name.]table_name
  -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [COMMENT col_comment], ...)]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...)
  [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)
  -- (Note: Available in Hive 0.10.0 and later)]
  ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
  [STORED AS DIRECTORIES]
  [
  [ROW FORMAT row_format]
  [STORED AS file_format]
  | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]
  -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];
  -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)

  CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];

  row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]]
  [COLLECTION ITEMS TERMINATED BY char]
  [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
  [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name
  [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

  file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

Errors

  • AlreadyExistsException

  • InvalidInputException

  • EntityNotFoundException

  • ResourceNumberLimitExceededException

  • InternalServiceException

  • OperationTimeoutException

Related Hive DDL:

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [database_name.]table_name
  -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [COMMENT col_comment], ...)]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...)
  [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)
  -- (Note: Available in Hive 0.10.0 and later)]
  ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
  [STORED AS DIRECTORIES]
  [
  [ROW FORMAT row_format]
  [STORED AS file_format]
  | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]
  -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];
  -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)

  CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];

  row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]]
  [COLLECTION ITEMS TERMINATED BY char]
  [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
  [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name
  [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

  file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

UpdateTable Action (Python: update_table)

Updates a metadata table in the Data Catalog.

Request

  • CatalogId – Catalog id string, matching the Single-line string pattern.

    The ID of the Data Catalog where the table resides. If none is supplied, the AWS account ID is used by default.

  • DatabaseName – String, matching the Single-line string pattern. Required.

    The name of the catalog database in which the table resides.

  • TableInput – A TableInput object. Required.

    An updated TableInput object to define the metadata table in the catalog.

Response

  • No Response parameters.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException

  • ConcurrentModificationException

DeleteTable Action (Python: delete_table)

Removes a table definition from the Data Catalog.

Request

  • CatalogId – Catalog id string, matching the Single-line string pattern.

    The ID of the Data Catalog where the table resides. If none is supplied, the AWS account ID is used by default.

  • DatabaseName – String, matching the Single-line string pattern. Required.

    The name of the catalog database in which the table resides.

  • Name – String, matching the Single-line string pattern. Required.

    The name of the table to be deleted.

Response

  • No Response parameters.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException

BatchDeleteTable Action (Python: batch_delete_table)

Deletes multiple tables at once.

Request

  • CatalogId – Catalog id string, matching the Single-line string pattern.

    The ID of the Data Catalog where the table resides. If none is supplied, the AWS account ID is used by default.

  • DatabaseName – String, matching the Single-line string pattern. Required.

    The name of the catalog database where the tables to delete reside.

  • TablesToDelete – An array of UTF-8 strings. Required.

    A list of the table to delete.

Response

  • Errors – An array of TableErrors.

    A list of errors encountered in attempting to delete the specified tables.

Errors

  • InvalidInputException

  • EntityNotFoundException

  • InternalServiceException

  • OperationTimeoutException

GetTable Action (Python: get_table)

Retrieves the Table definition in a Data Catalog for a specified table.

Request

  • CatalogId – Catalog id string, matching the Single-line string pattern.

    The ID of the Data Catalog where the table resides. If none is supplied, the AWS account ID is used by default.

  • DatabaseName – String, matching the Single-line string pattern. Required.

    The name of the database in the catalog in which the table resides.

  • Name – String, matching the Single-line string pattern. Required.

    The name of the table for which to retrieve the definition.

Response

  • Table – A Table object.

    The Table object that defines the specified table.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException

GetTables Action (Python: get_tables)

Retrieves the definitions of some or all of the tables in a given Database.

Request

  • CatalogId – Catalog id string, matching the Single-line string pattern.

    The ID of the Data Catalog where the tables reside. If none is supplied, the AWS account ID is used by default.

  • DatabaseName – String, matching the Single-line string pattern. Required.

    The database in the catalog whose tables to list.

  • Expression – String, matching the Single-line string pattern.

    A regular expression pattern. If present, only those tables whose names match the pattern are returned.

  • NextToken – String.

    A continuation token, included if this is a continuation call.

  • MaxResults – Number (integer).

    The maximum number of tables to return in a single response.

Response

  • TableList – An array of Tables.

    A list of the requested Table objects.

  • NextToken – String.

    A continuation token, present if the current list segment is not the last.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • OperationTimeoutException

  • InternalServiceException

GetTableVersions Action (Python: get_table_versions)

Retrieves a list of strings that identify available versions of a specified table.

Request

  • CatalogId – Catalog id string, matching the Single-line string pattern.

    The ID of the Data Catalog where the tables reside. If none is supplied, the AWS account ID is used by default.

  • DatabaseName – String, matching the Single-line string pattern. Required.

    The database in the catalog in which the table resides.

  • TableName – String, matching the Single-line string pattern. Required.

    The name of the table.

  • NextToken – String.

    A continuation token, if this is not the first call.

  • MaxResults – Number (integer).

    The maximum number of table versions to return in one response.

Response

  • TableVersions – An array of TableVersions.

    A list of strings identifying available versions of the specified table.

  • NextToken – String.

    A continuation token, if the list of available versions does not include the last one.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException