Updating column statistics - AWS Glue

Updating column statistics

Keeping statistics current improves query performance by enabling the query planner to choose optimal plans. You need to explicitly run the Generate statistics task from the AWS Glue console to refresh the column statistics. Data Catalog doesn't automatically refresh the statistics.

If you are not using AWS Glue's statistics generation feature in the console, you can manually update column statistics using the UpdateColumnStatisticsForTable API operation or AWS CLI. The following example shows how to update column statistics using AWS CLI.

aws glue update-column-statistics-for-table --cli-input-json: { "CatalogId": "111122223333", "DatabaseName": "test_db", "TableName": "test_table", "ColumnStatisticsList": [ { "ColumnName": "col1", "ColumnType": "Boolean", "AnalyzedTime": "1970-01-01T00:00:00", "StatisticsData": { "Type": "BOOLEAN", "BooleanColumnStatisticsData": { "NumberOfTrues": 5, "NumberOfFalses": 5, "NumberOfNulls": 0 } } } ] }