Query plan in Amazon Redshift - AWS Prescriptive Guidance

Query plan in Amazon Redshift

A query plan is a list of instructions that the execution engine must follow to execute a query on the data. You can create a query plan by running the EXPLAIN command followed by the actual query text, as the following example query shows:

EXPLAIN select s.s_name, sum(li.l_quantity) as quantity from tpch.lineitem li join tpch.orders o on o.o_orderkey = li.l_orderkey and o.o_orderdate > '1992-05-01' join tpch.supplier s on s.s_suppkey = li.l_suppkey group by s.s_name order by quantity desc limit 10;

If you run the EXPLAIN command for the query example above, you get the following output:

XN Limit (cost=1004400430902.15..1004400430902.17 rows=10 width=39) -> XN Merge (cost=1004400430902.15..1004400430926.97 rows=9928 width=39) Merge Key: sum(li.l_quantity) -> XN Network (cost=1004400430902.15..1004400430926.97 rows=9928 width=39) Send to leader -> XN Sort (cost=1004400430902.15..1004400430926.97 rows=9928 width=39) Sort Key: sum(li.l_quantity) -> XN HashAggregate (cost=4400430218.24..4400430243.06 rows=9928 width=39) -> XN Hash Join DS_BCAST_INNER (cost=21489.58..4400401726.35 rows=5698378 width=39) Hash Cond: ("outer".l_suppkey = "inner".s_suppkey) -> XN Hash Join DS_DIST_NONE (cost=21364.58..273387.85 rows=5698378 width=14) Hash Cond: ("outer".l_orderkey = "inner".o_orderkey) -> XN Seq Scan on lineitem li (cost=0.00..60012.15 rows=6001215 width=22) -> XN Hash (cost=17803.81..17803.81 rows=1424306 width=8) -> XN Seq Scan on orders o (cost=0.00..17803.81 rows=1424306 width=8) Filter: (o_orderdate > '1992-05-01'::date) -> XN Hash (cost=100.00..100.00 rows=10000 width=33) -> XN Seq Scan on supplier s (cost=0.00..100.00 rows=10000 width=33)
Note

The example query plan output is a simplified, high-level view of query execution. The example plan doesn't illustrate the details of parallel query processing. For detailed information, run the query and then use the SVL_QUERY_SUMMARY or SVL_QUERY_REPORT views to get query summary information.

Amazon Redshift query editor v2

You can also view the query plans in Amazon Redshift by using the Explain option in the query editor v2. For instructions, see Working with query editor v2 in the Amazon Redshift documentation.

The query plan generated by the query editor v2 includes the following information:

  • What operations the execution engine performs, reading the results from bottom to top

  • What type of step each operation performs

  • Which tables and columns are used in each operation

  • How much data is processed in each operation, in terms of number of rows and data width in bytes

  • The relative cost of the operation (Cost is a measure that compares the relative execution times of the steps within a plan. Cost doesn't provide precise information about actual execution times or memory consumption, nor does it provide a meaningful comparison between execution plans. Cost does, however, give you an indication of which operations in a query are consuming the most resources.)

EXPLAIN plans

You can use the STL_EXPLAIN system table to display the EXPLAIN plan for a query that has been submitted for execution. Overall, using STL_EXPLAIN can help improve the performance, efficiency, and cost-effectiveness of your Amazon Redshift queries.

The benefits of using STL_EXPLAIN are:

  • Performance optimizationSTL_EXPLAIN can help identify areas of a query that can be optimized for better performance.

  • Query planningSTL_EXPLAIN can provide information on how Amazon Redshift is executing the query and can help identify potential bottlenecks in the query.

  • Debugging STL_EXPLAIN can help diagnose issues with a query by showing the steps that Amazon Redshift takes to execute that query.

  • Understanding Amazon Redshift behaviorSTL_EXPLAIN can provide insights into how Amazon Redshift processes queries. This can help improve your understanding of Amazon Redshift behavior.

  • Cost optimizationSTL_EXPLAIN can provide information on the estimated cost of a query. This can help you identify areas where you can optimize costs.

The following query is an example that returns the plan nodes for a given query:

select nodeid as id, plannode, info from stl_explain where query=1042904 order by nodeid;

The preceding query returns the following output.

Output of a query that returns the query plan nodes.

The EXPLAIN plan returns useful metrics for each operation, including metrics on cost, rows, and width. For example, line 7 from the preceding query returns the following:

-> XN Hash Join DS_DIST_NONE (cost=21364.58..273387.85 rows=5698378 width=14)

Cost

Cost is a relative value that's useful for comparing operations within a plan. Cost consists of two decimal values separated by two periods. In this example, cost is equal to 21364.58..273387.85. Consider the following:

  • The first value (in this case, 21364.58) provides the relative cost of returning the first row for this operation.

  • The second value (in this case, 273387.85) provides the relative cost of completing the operation. 

The costs in the query plan are cumulative and roll up from lower to higher rows. In the example output above, row 7 includes the cost of the other operations in the rows below itself (that is, rows 8–12 and beyond).

Rows

Rows is the estimated number of rows to return. In this example, the scan is expected to return 5,698,378 rows. The rows estimate is based on the available statistics generated by the ANALYZE command. If ANALYZE has not been run recently, the estimate is less reliable.

Width

Width is the estimated width of the average row, in bytes. In this example, the average row is expected to be 14 bytes wide.