Troubleshooting in Athena - Amazon Athena

Troubleshooting in Athena

The Athena team has gathered the following troubleshooting information from customer issues. Although not comprehensive, it includes advice regarding some common performance, timeout, and out of memory issues.

CREATE TABLE AS SELECT (CTAS)

Duplicated data occurs with concurrent CTAS statements

Athena does not maintain concurrent validation for CTAS. Make sure that there is no duplicate CTAS statement for the same location at the same time. Even if a CTAS or INSERT INTO statement fails, orphaned data can be left in the data location specified in the statement.

Data File Issues

Athena cannot read hidden files

Athena treats sources files that start with an underscore (_) or a dot (.) as hidden. To work around this limitation, rename the files.

Athena cannot read files stored in the Glacier storage class

To work around this issue, copy the restored objects back into Amazon S3. For more information, see Considerations and Limitations for SQL Queries in Amazon Athena.

Athena reads files that I excluded from the AWS Glue crawler

Athena does not recognize exclude patterns that you specify an AWS Glue crawler. For example, if you have an Amazon S3 bucket that contains both .csv and .json files and you exclude the .json files from the crawler, Athena queries both groups of files. To avoid this, place the files that you want to exclude in a different location.

HIVE_BAD_DATA: Error parsing field value

This error can occur in the following scenarios:

HIVE_CURSOR_ERROR: Unexpected end of input stream

This message indicates the file is either corrupted or empty. Check the integrity of the file and rerun the query.

HIVE_CURSOR_ERROR: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist

This error usually occurs when a file is removed when a query is running. Either rerun the query, or check your workflow to see if another job or process is modifying the files when the query is running.

HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://bucket-name

This error can occur when you query an Amazon S3 bucket prefix that has a large number of objects. For more information, see How do I resolve the "HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://awsdoc-example-bucket/: Slow Down" error in Athena? in the AWS Knowledge Center.

HIVE_UNKNOWN_ERROR: Unable to create input format

This error can be a result of issues like the following:

  • The AWS Glue crawler wasn't able to classify the data format

  • Certain AWS Glue table definition properties are empty

  • Athena doesn't support the data format of the files in Amazon S3

For more information, see How do I resolve the error "unable to create input format" in Athena? in the AWS Knowledge Center or watch the Knowledge Center video.

org.apache.parquet.io.GroupColumnIO cannot be cast to org.apache.parquet.io.PrimitiveColumnIO

This error is caused by a parquet schema mismatch. A column that has a non-primitive type (for example, array) has been declared as a primitive type (for example, string) in AWS Glue. To troubleshoot this issue, check the data schema in the files and compare it with schema declared in AWS Glue.

The S3 location provided to save your query results is invalid.

Make sure that you have specified a valid S3 location for your query results. For more information, see Specifying a Query Result Location in the Working with Query Results, Output Files, and Query History topic.

Federated Queries

For information on troubleshooting federated queries, see Common_Problems in the awslabs/aws-athena-query-federation section of GitHub.

NULL or incorrect data errors when trying to read JSON data

NULL or incorrect data errors when you try read JSON data can be due to a number of causes. To identify lines that are causing errors when you are using the OpenX SerDe, set ignore.malformed.json to true. Malformed records will return as NULL. For more information, see I get errors when I try to read JSON data in Amazon Athena in the AWS Knowledge Center or watch the Knowledge Center video.

HIVE_BAD_DATA: Error parsing field value for field 0: java.lang.String cannot be cast to org.openx.data.jsonserde.json.JSONObject

The OpenX JSON SerDe throws this error when it fails to parse a column in an Athena query. This can happen if you define a column as a map or struct, but the underlying data is actually a string, int, or other primitive type.

HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: Duplicate key

This error occurs when you use Athena to query AWS Config resources that have multiple tags with the same name in different case. The solution is to run CREATE TABLE using WITH SERDEPROPERTIES 'case.insensitive'='false' and map the names. For information about case.insensitive and mapping, see JSON SerDe Libraries. For more information, see How do I resolve "HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: Duplicate key" when reading files from AWS Config in Athena? in the AWS Knowledge Center.

Multiple JSON records return a SELECT COUNT of 1

If you're using the OpenX JSON SerDe, make sure that the records are separated by a newline character. For more information, see The SELECT COUNT query in Amazon Athena returns only one record even though the input JSON file has multiple records in the AWS Knowledge Center.

Cannot query a table created by a AWS Glue crawler that uses a custom JSON classifier

The Athena engine does not support custom JSON classifiers. To work around this issue, create a new table without the custom classifier. To transform the JSON, you can use CTAS or create a view. For example, if you are working with arrays, you can use the UNNEST option to flatten the JSON. Another option is to use a AWS Glue ETL job that supports the custom classifier, convert the data to parquet in Amazon S3, and then query it in Athena.

MSCK REPAIR TABLE

For information about MSCK REPAIR TABLE related issues, see the Considerations and Limitations and Troubleshooting sections of the MSCK REPAIR TABLE page.

Output Issues

Unable to verify/create output bucket

This error can occur if the specified query result location doesn't exist or if the proper permissions are not present. For more information, see How do I resolve the "Unable to verify/create output bucket" error in Amazon Athena? in the AWS Knowledge Center.

TIMESTAMP result is empty

Athena requires the Java TIMESTAMP format. For more information, see When I query a table in Amazon Athena, the TIMESTAMP result is empty in the AWS Knowledge Center.

Store Athena query output in a format other than CSV

Currently, Athena outputs files in CSV format only, but you can work around this limitation by using a CTAS query and configuring the format table property. For more information, see How can I store an Athena query output in a format other than CSV, such as a compressed format? in the AWS Knowledge Center.

The S3 location provided to save your query results is invalid

You can receive this error message if your output bucket location is not in the same Region as the Region in which you run your query. To avoid this, specify a query results location in the Region in which you run the query. For steps, see Specifying a Query Result Location.

Partitioning Issues

MSCK REPAIR TABLE does not remove stale partitions

If you delete a partition manually in Amazon S3 and then run MSCK REPAIR TABLE, you may receive the error message Partitions missing from filesystem. This occurs because MSCK REPAIR TABLE doesn't remove stale partitions from table metadata. Use ALTER TABLE DROP PARTITION to remove the stale partitions manually. For more information, see the "Troubleshooting" section of the MSCK REPAIR TABLE topic.

MSCK REPAIR TABLE failure

When a large amount of partitions (for example, more than 100,000) are associated with a particular table, MSCK REPAIR TABLE can fail due to memory limitations. To work around this limit, use ALTER TABLE ADD PARTITION instead.

MSCK REPAIR TABLE detects partitions but doesn't add them to AWS Glue

This issue can occur if an Amazon S3 path is in camel case instead of lower case or an IAM policy doesn't allow the glue:BatchCreatePartition action. For more information, see MSCK REPAIR TABLE detects partitions in Athena but does not add them to the AWS Glue Data Catalog in the AWS Knowledge Center.

Partition projection ranges with the date format of dd-MM-yyyy-HH-mm-ss or yyyy-MM-dd do not work

To work correctly, the date format must be set to yyyy-MM-dd HH:00:00. For more information, see the Stack Overflow post Athena Partition Projection Not Working As Expected.

PARTITION BY doesn't support the BIGINT type

Convert the data type to string and retry.

No meaningful partitions available

This error message usually means the partition settings have been corrupted. To resolve this issue, drop the table and create a table with new partitions.

Partition projection does not work in conjunction with range partitions

Check that the time range unit projection.<columnName>.interval.unit matches the delimiter for the partitions. For example, if partitions are delimited by days, then a range unit of hours will not work.

HIVE_UNKNOWN_ERROR: Unable to create input format

One or more of the glue partitions are declared in a different format as each glue partition has their own specific input format independently. Please check how your partitions are defined in AWS Glue.

HIVE_PARTITION_SCHEMA_MISMATCH

If the schema of a partition differs from the schema of the table, a query can fail with the error message HIVE_PARTITION_SCHEMA_MISMATCH. For more information, see Syncing Partition Schema to Avoid "HIVE_PARTITION_SCHEMA_MISMATCH".

SemanticException table is not partitioned but partition spec exists

This error can occur when no partitions were defined in the CREATE TABLE statement. For more information, see How can I troubleshoot the error "FAILED: SemanticException table is not partitioned but partition spec exists" in Athena? in the AWS Knowledge Center.

Zero records returned from partitioned data

This issue can occur for a variety of reasons. For possible causes and resolutions, see I created a table in Amazon Athena with defined partitions, but when I query the table, zero records are returned in the AWS Knowledge Center.

Permissions

Access Denied Error when querying Amazon S3

This can occur when you don't have permission to read the data in the bucket, permission to write to the results bucket, or the Amazon S3 path contains a Region endpoint like us-east-1.amazonaws.com. For more information, see When I run an Athena query, I get an "Access Denied" error in the AWS Knowledge Center.

Access Denied with Status Code: 403 error when running DDL queries on encrypted data in Amazon S3

When you may receive the error message Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: <request_id>) if the following conditions are true:

  1. You run a DDL query like ALTER TABLE ADD PARTITION or MSCK REPAIR TABLE.

  2. You have a bucket that has default encryption configured to use SSE-S3.

  3. The bucket also has a bucket policy like the following that forces PutObject requests to specify the PUT headers "s3:x-amz-server-side-encryption": "true" and "s3:x-amz-server-side-encryption": "AES256".

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Deny", "Principal": "*", "Action": "s3:PutObject", "Resource": "arn:aws:s3:::<resource-name>/*", "Condition": { "Null": { "s3:x-amz-server-side-encryption": "true" } } }, { "Effect": "Deny", "Principal": "*", "Action": "s3:PutObject", "Resource": "arn:aws:s3:::<resource-name>/*", "Condition": { "StringNotEquals": { "s3:x-amz-server-side-encryption": "AES256" } } } ] }

In a case like this, the recommended solution is to remove the bucket policy like the one above given that the bucket's default encryption is already present.

Access Denied with Status Code: 403 when querying an Amazon S3 bucket in another account

This error can occur when you try to query logs written by another AWS service and the second account is the bucket owner but does not own the objects in the bucket. For more information, see I get the Amazon S3 Exception "Access Denied with Status Code: 403" in Amazon Athena when I query a bucket in another account in the AWS Knowledge Center or watch the Knowledge Center video.

Use IAM role credentials to connect to the Athena JDBC driver

You can retrieve a role's temporary credentials to authenticate the JDBC connection to Athena. Temporary credentials have a maximum lifespan of 12 hours. For more information, see How can I use my IAM role credentials or switch to another IAM role when connecting to Athena using the JDBC driver? in the AWS Knowledge Center.

Query Syntax Issues

Function not registered

This error occurs when you try to use a function that Athena doesn't support. For a list of functions that Athena supports, see Presto Functions in Amazon Athena or run the SHOW FUNCTIONS statement in the Query Editor. You can also write your own user defined function (UDF). For more information, see How do I resolve the "function not registered" syntax error in Athena? in the AWS Knowledge Center.

Number of matching groups doesn't match the number of columns

This error occurs when you use the Regex SerDe in a CREATE TABLE statement and the number of regex matching groups doesn't match the number of columns that you specified for the table. For more information, see How do I resolve the RegexSerDe error "Number of matching groups doesn't match the number of columns" in Amazon Athena? in the AWS Knowledge Center.

queryString failed to satisfy constraint: Member must have length less than or equal to 262144

The maximum query string length in Athena (262,144 bytes) is not an adjustable quota. AWS Support can't increase the quota for you, but you can work around the issue by splitting long queries into smaller ones. For more information, see How can I increase the maximum query string length in Athena? in the AWS Knowledge Center.

SYNTAX_ERROR: Column cannot be resolved

This error can occur when you query a table created by an AWS Glue crawler from a UTF-8 encoded CSV file that has a byte order mark (BOM). AWS Glue doesn't recognize the BOMs and changes them to question marks, which Amazon Athena doesn't recognize. The solution is to remove the question mark in Athena or in AWS Glue.

Throttling Issues

If your queries exceed the limits of dependent services such as Amazon S3, AWS KMS, AWS Glue, or AWS Lambda, the following messages can be expected. To resolve these issues, reduce the number of concurrent calls that originate from the same account.

Service Error Message
AWS Glue AWSGlueException: Rate exceeded.
AWS KMS You have exceeded the rate at which you may call KMS. Reduce the frequency of your calls.
AWS Lambda

Rate exceeded

TooManyRequestsException

Amazon S3 AmazonS3Exception: Please reduce your request rate.

Views

Views created in Apache Hive shell do not work in Athena

Because of their fundamentally different implementations, views created in Apache Hive shell are not compatible with Athena. To resolve this issue, re-create the views in Athena.

View is stale; it must be re-created

You can receive this error if the table that underlies a view has altered or dropped. The resolution is to recreate the view. For more information, see How can I resolve the "View is stale; it must be re-created" error in Athena? in the AWS Knowledge Center.

Workgroups

For information on troubleshooting workgroup issues, see Troubleshooting Workgroups.

Additional Resources

The following pages provide additional information for troubleshooting issues with Amazon Athena.

The following AWS resources can also be of help:

Troubleshooting often requires iterative query and discovery by an expert or from a community of helpers. If you continue to experience issues after trying the suggestions on this page, contact AWS Support (in the AWS Management Console, click Support, Support Center) or visit the Amazon Athena Forum.