Troubleshooting queries - Amazon Redshift

Troubleshooting queries

This section provides a quick reference for identifying and addressing some of the most common and most serious issues that you are likely to encounter with Amazon Redshift queries.

These suggestions give you a starting point for troubleshooting. You can also refer to the following resources for more detailed information.

Connection fails

Your query connection can fail for the following reasons; we suggest the following troubleshooting approaches.

Client cannot connect to server

If you are using SSL or server certificates, first remove this complexity while you troubleshoot the connection issue. Then add SSL or server certificates back when you have found a solution. For more information, go to Configure Security Options for Connections in the Amazon Redshift Management Guide.

Connection is refused

Generally, when you receive an error message indicating that there is a failure to establish a connection, it means that there is an issue with the permission to access the cluster. For more information, go to The connection is refused or fails in the Amazon Redshift Management Guide.

Query hangs

Your query can hang, or stop responding, for the following reasons; we suggest the following troubleshooting approaches.

Connection to the database is dropped

Reduce the size of maximum transmission unit (MTU). The MTU size determines the maximum size, in bytes, of a packet that can be transferred in one Ethernet frame over your network connection. For more information, go to The connection to the database is dropped in the Amazon Redshift Management Guide.

Connection to the database times out

Your client connection to the database appears to hang or time out when running long queries, such as a COPY command. In this case, you might observe that the Amazon Redshift console displays that the query has completed, but the client tool itself still appears to be running the query. The results of the query might be missing or incomplete depending on when the connection stopped. This effect happens when idle connections are terminated by an intermediate network component. For more information, go to Firewall Timeout Issue in the Amazon Redshift Management Guide.

Client-side out-of-memory error occurs with ODBC

If your client application uses an ODBC connection and your query creates a result set that is too large to fit in memory, you can stream the result set to your client application by using a cursor. For more information, see DECLARE and Performance considerations when using cursors.

Client-side out-of-memory error occurs with JDBC

When you attempt to retrieve large result sets over a JDBC connection, you might encounter client-side out-of-memory errors. For more information, see Setting the JDBC fetch size parameter.

There is a potential deadlock

If there is a potential deadlock, try the following:

  • View the STV_LOCKS and STL_TR_CONFLICT system tables to find conflicts involving updates to more than one table.

  • Use the PG_CANCEL_BACKEND function to cancel one or more conflicting queries.

  • Use the PG_TERMINATE_BACKEND function to terminate a session, which forces any currently running transactions in the terminated session to release all locks and roll back the transaction.

  • Schedule concurrent write operations carefully. For more information, see Managing concurrent write operations.

Query takes too long

Your query can take too long for the following reasons; we suggest the following troubleshooting approaches.

Tables are not optimized

Set the sort key, distribution style, and compression encoding of the tables to take full advantage of parallel processing. For more information, see Working with automatic table optimization

Query is writing to disk

Your queries might be writing to disk for at least part of the query execution. For more information, see Improving query performance.

Query must wait for other queries to finish

You might be able to improve overall system performance by creating query queues and assigning different types of queries to the appropriate queues. For more information, see Implementing workload management.

Queries are not optimized

Analyze the explain plan to find opportunities for rewriting queries or optimizing the database. For more information, see Query plan.

Query needs more memory to run

If a specific query needs more memory, you can increase the available memory by increasing the wlm_query_slot_count.

Database requires a VACUUM command to be run

Run the VACUUM command whenever you add, delete, or modify a large number of rows, unless you load your data in sort key order. The VACUUM command reorganizes your data to maintain the sort order and restore performance. For more information, see Vacuuming tables.

Additional resources for troubleshooting long-running queries

The following are system-view topics and other documentation sections that are helpful for query tuning:

  • The STV_INFLIGHT system view shows which queries are running on the cluster. It can be helpful to use it together with STV_RECENTS to determine which queries are currently running or recently completed.

  • SYS_QUERY_HISTORY is useful for troubleshooting. It shows DDL and DML queries with relevant properties like their current status, such as running or failed, the time it took each to run, and whether a query ran on a concurrency-scaling cluster.

  • STL_QUERYTEXT captures the query text for SQL commands. Additionally, SVV_QUERY_INFLIGHT, which joins STL_QUERYTEXT to STV_INFLIGHT, shows more query metadata.

  • A transaction-lock conflict can be a possible source of query-performance issues. For information about transactions that currently hold locks on tables, see SVV_TRANSACTIONS.

  • Identifying queries that are top candidates for tuning provides a troubleshooting query that helps you determine which recently-run queries were the most time consuming. This can help you focus your efforts on queries that need improvement.

  • If you want to explore query management further and understand how to manage query queues, Implementing workload management shows how to do it. Workload management is an advanced feature and we recommend automated workload management in most cases.

Load fails

Your data load can fail for the following reasons; we suggest the following troubleshooting approaches.

Data Source is in a different AWS Region

By default, the Amazon S3 bucket or Amazon DynamoDB table specified in the COPY command must be in the same AWS Region as the cluster. If your data and your cluster are in different Regions, you receive an error similar to the following:

The bucket you are attempting to access must be addressed using the specified endpoint.

If at all possible, make sure your cluster and your data source are in the same Region. You can specify a different Region by using the REGION option with the COPY command.

Note

If your cluster and your data source are in different AWS Regions, you incur data transfer costs. You also have higher latency.

COPY command fails

Query STL_LOAD_ERRORS to discover the errors that occurred during specific loads. For more information, see STL_LOAD_ERRORS.

Load takes too long

Your load operation can take too long for the following reasons; we suggest the following troubleshooting approaches.

COPY loads data from a single file

Split your load data into multiple files. When you load all the data from a single large file, Amazon Redshift is forced to perform a serialized load, which is much slower. The number of files should be a multiple of the number of slices in your cluster, and the files should be about equal size, between 1 MB and 1 GB after compression. For more information, see Amazon Redshift best practices for designing queries.

Load operation uses multiple COPY commands

If you use multiple concurrent COPY commands to load one table from multiple files, Amazon Redshift is forced to perform a serialized load, which is much slower. In this case, use a single COPY command.

Load data is incorrect

Your COPY operation can load incorrect data in the following ways; we suggest the following troubleshooting approaches.

Wrong files are loaded

Using an object prefix to specify data files can cause unwanted files to be read. Instead, use a manifest file to specify exactly which files to load. For more information, see the copy_from_s3_manifest_file option for the COPY command and Example: COPY from Amazon S3 using a manifest in the COPY examples.

Setting the JDBC fetch size parameter

By default, the JDBC driver collects all the results for a query at one time. As a result, when you attempt to retrieve a large result set over a JDBC connection, you might encounter a client-side out-of-memory error. To enable your client to retrieve result sets in batches instead of in a single all-or-nothing fetch, set the JDBC fetch size parameter in your client application.

Note

Fetch size is not supported for ODBC.

For the best performance, set the fetch size to the highest value that does not lead to out of memory errors. A lower fetch size value results in more server trips, which prolong execution times. The server reserves resources, including the WLM query slot and associated memory, until the client retrieves the entire result set or the query is canceled. When you tune the fetch size appropriately, those resources are released more quickly, making them available to other queries.

Note

If you need to extract large datasets, we recommend using an UNLOAD statement to transfer the data to Amazon S3. When you use UNLOAD, the compute nodes work in parallel to speed up the transfer of data.

For more information about setting the JDBC fetch size parameter, go to Getting results based on a cursor in the PostgreSQL documentation.