Column statistics API - AWS Glue

Column statistics API

The column statistics API describes AWS Glue APIs for returning statistics on columns in a table.

Data types

ColumnStatisticsTaskRun structure

The object that shows the details of the column stats run.

Fields
  • CustomerId – UTF-8 string, not more than 12 bytes long.

    The AWS account ID.

  • ColumnStatisticsTaskRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The identifier for the particular column statistics task run.

  • DatabaseName – UTF-8 string.

    The database where the table resides.

  • TableName – UTF-8 string.

    The name of the table for which column statistics is generated.

  • ColumnNameList – An array of UTF-8 strings.

    A list of the column names. If none is supplied, all column names for the table will be used by default.

  • CatalogID – Catalog id string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the table resides. If none is supplied, the AWS account ID is used by default.

  • Role – UTF-8 string.

    The IAM role that the service assumes to generate statistics.

  • SampleSize – Number (double), not more than 100.

    The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.

  • SecurityConfiguration – UTF-8 string, not more than 128 bytes long.

    Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.

  • NumberOfWorkers – Number (integer), at least 1.

    The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.

  • WorkerType – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The type of workers being used for generating stats. The default is g.1x.

  • Status – UTF-8 string (valid values: STARTING | RUNNING | SUCCEEDED | FAILED | STOPPED).

    The status of the task run.

  • CreationTime – Timestamp.

    The time that this task was created.

  • LastUpdated – Timestamp.

    The last point in time when this task was modified.

  • StartTime – Timestamp.

    The start time of the task.

  • EndTime – Timestamp.

    The end time of the task.

  • ErrorMessage – Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern.

    The error message for the job.

  • DPUSeconds – Number (double), not more than None.

    The calculated DPU usage in seconds for all autoscaled workers.

ColumnStatisticsTaskRunningException structure

An exception thrown when you try to start another job while running a column stats generation job.

Fields
  • Message – UTF-8 string.

    A message describing the problem.

ColumnStatisticsTaskNotRunningException structure

An exception thrown when you try to stop a task run when there is no task running.

Fields
  • Message – UTF-8 string.

    A message describing the problem.

ColumnStatisticsTaskStoppingException structure

An exception thrown when you try to stop a task run.

Fields
  • Message – UTF-8 string.

    A message describing the problem.

Operations

StartColumnStatisticsTaskRun action (Python: start_column_statistics_task_run)

Starts a column statistics task run, for a specified table and columns.

Request
  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the database where the table resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table to generate statistics.

  • ColumnNameList – An array of UTF-8 strings.

    A list of the column names to generate statistics. If none is supplied, all column names for the table will be used by default.

  • RoleRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The IAM role that the service assumes to generate statistics.

  • SampleSize – Number (double), not more than 100.

    The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.

  • CatalogID – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The ID of the Data Catalog where the table reside. If none is supplied, the AWS account ID is used by default.

  • SecurityConfiguration – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.

Response
  • ColumnStatisticsTaskRunId – UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The identifier for the column statistics task run.

Errors
  • AccessDeniedException

  • EntityNotFoundException

  • ColumnStatisticsTaskRunningException

  • OperationTimeoutException

  • ResourceNumberLimitExceededException

  • InvalidInputException

GetColumnStatisticsTaskRun action (Python: get_column_statistics_task_run)

Get the associated metadata/information for a task run, given a task run ID.

Request
  • ColumnStatisticsTaskRunIdRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The identifier for the particular column statistics task run.

Response
  • ColumnStatisticsTaskRun – A ColumnStatisticsTaskRun object.

    A ColumnStatisticsTaskRun object representing the details of the column stats run.

Errors
  • EntityNotFoundException

  • OperationTimeoutException

  • InvalidInputException

GetColumnStatisticsTaskRuns action (Python: get_column_statistics_task_runs)

Retrieves information about all runs associated with the specified table.

Request
  • DatabaseNameRequired: UTF-8 string.

    The name of the database where the table resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table.

  • MaxResults – Number (integer), not less than 1 or more than 1000.

    The maximum size of the response.

  • NextToken – UTF-8 string.

    A continuation token, if this is a continuation call.

Response
  • ColumnStatisticsTaskRuns – An array of ColumnStatisticsTaskRun objects.

    A list of column statistics task runs.

  • NextToken – UTF-8 string.

    A continuation token, if not all task runs have yet been returned.

Errors
  • OperationTimeoutException

ListColumnStatisticsTaskRuns action (Python: list_column_statistics_task_runs)

List all task runs for a particular account.

Request
  • MaxResults – Number (integer), not less than 1 or more than 1000.

    The maximum size of the response.

  • NextToken – UTF-8 string.

    A continuation token, if this is a continuation call.

Response
  • ColumnStatisticsTaskRunIds – An array of UTF-8 strings, not more than 100 strings.

    A list of column statistics task run IDs.

  • NextToken – UTF-8 string.

    A continuation token, if not all task run IDs have yet been returned.

Errors
  • OperationTimeoutException

StopColumnStatisticsTaskRun action (Python: stop_column_statistics_task_run)

Stops a task run for the specified table.

Request
  • DatabaseNameRequired: UTF-8 string.

    The name of the database where the table resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table.

Response
  • No Response parameters.

Errors
  • EntityNotFoundException

  • ColumnStatisticsTaskNotRunningException

  • ColumnStatisticsTaskStoppingException

  • OperationTimeoutException