Amazon Redshift
Database Developer Guide (API Version 2012-12-01)

Troubleshooting Queries in Amazon Redshift Spectrum

Following, you can find a quick reference for identifying and addressing some of the most common and most serious issues you are likely to encounter with Amazon Redshift Spectrum queries. To view errors generated by Redshift Spectrum queries, query the SVL_S3LOG system table.

Retries Exceeded

If an Amazon Redshift Spectrum request times out, the request is canceled and resubmitted. After five failed retries, the query fails with the following error.

error:  S3Query Exception (Fetch), retries exceeded

Possible causes include the following:

  • Large file sizes (greater than 1 GB). Check your file sizes in Amazon S3 and look for large files and file size skew. Break up large files into smaller files, between 100 MB and 1 GB. Try to make files about the same size.

  • Slow network throughput. Try your query later.

No Rows Returned for a Partitioned Table

If your query returns zero rows from a partitioned external table, check whether a partition has been added for this external table. Redshift Spectrum only scans files in an Amazon S3 location that has been explicitly added using ALTER TABLE … ADD PARTITION. Query the SVV_EXTERNAL_PARTITIONS view to find existing partitions. Run ALTER TABLE ADD … PARTITION for each missing partition.

Not Authorized Error

Verify that the IAM role for the cluster allows access to the Amazon S3 file objects. If your external database is on Amazon Athena, verify that the AWS Identity and Access Management (IAM) role allows access to Athena resources. For more information, see IAM Policies for Amazon Redshift Spectrum.

Incompatible Data Formats

For a columnar file format, such as Parquet, the column type is embedded with the data. The column type in the CREATE EXTERNAL TABLE definition must match the column type of the data file. If there is a mismatch, you receive an error similar to the following:

Task failed due to an internal error. File 'https://s3bucket/location/file has an incompatible Parquet schema for column ‘s3://s3bucket/location.col1'. Column type: VARCHAR, Par

The error message might be truncated due to the limit on message length. To retrieve the complete error message, including column name and column type, query the SVL_S3LOG system view.

The following example queries SVL_S3LOG for the last query executed.

select message from svl_s3log where query = pg_last_query_id() order by query,segment,slice;

The following is an example of a result that shows the full error message.

S3 Query Exception (Fetch). Task failed due to an internal error. 
File 'https://s3bucket/location/file has an incompatible 
Parquet schema for column ' s3bucket/location.col1'. 
Column type: VARCHAR, Parquet schema:\noptional int64 l_orderkey [i:0 d:1 r:0]\n

To correct the error, alter the external table to match the column type of the Parquet file.

Syntax Error When Using Hive DDL in Amazon Redshift

Amazon Redshift supports data definition language (DDL) for CREATE EXTERNAL TABLE that is similar to Hive DDL. However, the two types of DDL aren't always exactly the same. If you copy Hive DDL to create or alter Amazon Redshift external tables, you might encounter syntax errors. The following are examples of differences between Amazon Redshift and Hive DDL:

  • Amazon Redshift requires single quotation marks (') where Hive DDL supports double quotation marks (").

  • Amazon Redshift doesn't support the STRING data type. Use VARCHAR instead.

Permission to Create Temporary Tables

To run Redshift Spectrum queries, the database user must have permission to create temporary tables in the database. The following example grants temporary permission on the database spectrumdb to the spectrumusers user group.

grant temp on database spectrumdb to group spectrumusers;

For more information, see GRANT.