Amazon Redshift will no longer support the creation of new Python UDFs starting Patch 198. Existing Python UDFs will continue to function until June 30, 2026. For more information, see the [ blog post ](https://aws.amazon.com/blogs/big-data/amazon-redshift-python-user-defined-functions-will-reach-end-of-support-after-june-30-2026/). 

# SELECT
<a name="r_SELECT_synopsis"></a>

Returns rows from tables, views, and user-defined functions. 

**Note**  
The maximum size for a single SQL statement is 16 MB.

## Syntax
<a name="r_SELECT_synopsis-synopsis"></a>

```
[ WITH with_subquery [, ...] ]
SELECT
[ TOP number | [ ALL | DISTINCT ]
* | expression [ AS output_name ] [, ...] ]
[ EXCLUDE column_list ]
[ FROM table_reference [, ...] ]
[ WHERE condition ]
[ [ START WITH expression ] CONNECT BY expression ]
[ GROUP BY ALL | expression [, ...] ]
[ HAVING condition ]
[ QUALIFY condition ]
[ { UNION | ALL | INTERSECT | EXCEPT | MINUS } query ]
[ ORDER BY expression [ ASC | DESC ] ]
[ LIMIT { number | ALL } ]
[ OFFSET start ]
```

**Topics**
+ [Syntax](#r_SELECT_synopsis-synopsis)
+ [WITH clause](r_WITH_clause.md)
+ [SELECT list](r_SELECT_list.md)
+ [EXCLUDE column\$1list](r_EXCLUDE_list.md)
+ [FROM clause](r_FROM_clause30.md)
+ [WHERE clause](r_WHERE_clause.md)
+ [GROUP BY clause](r_GROUP_BY_clause.md)
+ [HAVING clause](r_HAVING_clause.md)
+ [QUALIFY clause](r_QUALIFY_clause.md)
+ [UNION, INTERSECT, and EXCEPT](r_UNION.md)
+ [ORDER BY clause](r_ORDER_BY_clause.md)
+ [CONNECT BY clause](r_CONNECT_BY_clause.md)
+ [Subquery examples](r_Subquery_examples.md)
+ [Correlated subqueries](r_correlated_subqueries.md)

# WITH clause
<a name="r_WITH_clause"></a>

A WITH clause is an optional clause that precedes the SELECT list in a query. The WITH clause defines one or more *common\$1table\$1expressions*. Each common table expression (CTE) defines a temporary table, which is similar to a view definition. You can reference these temporary tables in the FROM clause. They're used only while the query they belong to runs. Each CTE in the WITH clause specifies a table name, an optional list of column names, and a query expression that evaluates to a table (a SELECT statement). When you reference the temporary table name in the FROM clause of the same query expression that defines it, the CTE is recursive. 

WITH clause subqueries are an efficient way of defining tables that can be used throughout the execution of a single query. In all cases, the same results can be achieved by using subqueries in the main body of the SELECT statement, but WITH clause subqueries may be simpler to write and read. Where possible, WITH clause subqueries that are referenced multiple times are optimized as common subexpressions; that is, it may be possible to evaluate a WITH subquery once and reuse its results. (Note that common subexpressions aren't limited to those defined in the WITH clause.)

## Syntax
<a name="r_WITH_clause-synopsis"></a>

```
[ WITH [RECURSIVE] common_table_expression [, common_table_expression , ...] ]
```

where *common\$1table\$1expression* can be either non-recursive or recursive. Following is the non-recursive form: 

```
CTE_table_name [ ( column_name [, ...] ) ] AS ( query )
```

Following is the recursive form of *common\$1table\$1expression*:

```
CTE_table_name (column_name [, ...] ) AS ( recursive_query )
```

## Parameters
<a name="r_WITH_clause-parameters"></a>

 RECURSIVE   
Keyword that identifies the query as a recursive CTE. This keyword is required if any *common\$1table\$1expression* defined in the WITH clause is recursive. You can only specify the RECURSIVE keyword once, immediately following the WITH keyword, even when the WITH clause contains multiple recursive CTEs. In general, a recursive CTE is a UNION ALL subquery with two parts. 

 *common\$1table\$1expression*   
Defines a temporary table that you can reference in the [FROM clause](r_FROM_clause30.md) and is used only during the execution of the query to which it belongs. 

 *CTE\$1table\$1name*   
A unique name for a temporary table that defines the results of a WITH clause subquery. You can't use duplicate names within a single WITH clause. Each subquery must be given a table name that can be referenced in the [FROM clause](r_FROM_clause30.md).

 *column\$1name*   
 A list of output column names for the WITH clause subquery, separated by commas. The number of column names specified must be equal to or less than the number of columns defined by the subquery. For a CTE that is non-recursive, the *column\$1name* clause is optional. For a recursive CTE, the *column\$1name* list is required.

 *query*   
 Any SELECT query that Amazon Redshift supports. See [SELECT](r_SELECT_synopsis.md). 

 *recursive\$1query*   
A UNION ALL query that consists of two SELECT subqueries:  
+ The first SELECT subquery doesn't have a recursive reference to the same *CTE\$1table\$1name*. It returns a result set that is the initial seed of the recursion. This part is called the initial member or seed member.
+ The second SELECT subquery references the same *CTE\$1table\$1name* in its FROM clause. This is called the recursive member. The *recursive\$1query* contains a WHERE condition to end the *recursive\$1query*. 

## Usage notes
<a name="r_WITH_clause-usage-notes"></a>

You can use a WITH clause in the following SQL statements: 
+ SELECT 
+ SELECT INTO
+ CREATE TABLE AS
+ CREATE VIEW
+ DECLARE
+ EXPLAIN
+ INSERT INTO...SELECT 
+ PREPARE
+ UPDATE (within a WHERE clause subquery. You can't define a recursive CTE in the subquery. The recursive CTE must precede the UPDATE clause.)
+ DELETE

If the FROM clause of a query that contains a WITH clause doesn't reference any of the tables defined by the WITH clause, the WITH clause is ignored and the query runs as normal.

A table defined by a WITH clause subquery can be referenced only in the scope of the SELECT query that the WITH clause begins. For example, you can reference such a table in the FROM clause of a subquery in the SELECT list, WHERE clause, or HAVING clause. You can't use a WITH clause in a subquery and reference its table in the FROM clause of the main query or another subquery. This query pattern results in an error message of the form `relation table_name doesn't exist` for the WITH clause table.

You can't specify another WITH clause inside a WITH clause subquery.

You can't make forward references to tables defined by WITH clause subqueries. For example, the following query returns an error because of the forward reference to table W2 in the definition of table W1: 

```
with w1 as (select * from w2), w2 as (select * from w1)
select * from sales;
ERROR:  relation "w2" does not exist
```

A WITH clause subquery may not consist of a SELECT INTO statement; however, you can use a WITH clause in a SELECT INTO statement.

## Recursive common table expressions
<a name="r_WITH_clause-recursive-cte"></a>

A recursive *common table expression (CTE)* is a CTE that references itself. A recursive CTE is useful in querying hierarchical data, such as organization charts that show reporting relationships between employees and managers. See [Example: Recursive CTE](#r_WITH_clause-recursive-cte-example).

Another common use is a multilevel bill of materials, when a product consists of many components and each component itself also consists of other components or subassemblies.

Be sure to limit the depth of recursion by including a WHERE clause in the second SELECT subquery of the recursive query. For an example, see [Example: Recursive CTE](#r_WITH_clause-recursive-cte-example). Otherwise, an error can occur similar to the following:
+ `Recursive CTE out of working buffers.`
+ `Exceeded recursive CTE max rows limit, please add correct CTE termination predicates or change the max_recursion_rows parameter.`

**Note**  
`max_recursion_rows` is a parameter setting the maximum number of rows a recursive CTE can return in order to prevent infinite recursion loops. We recommend against changing this to a larger value than the default. This prevents infinite recursion problems in your queries from taking up excessive space in your cluster.

 You can specify a sort order and limit on the result of the recursive CTE. You can include group by and distinct options on the final result of the recursive CTE.

You can't specify a WITH RECURSIVE clause inside a subquery. The *recursive\$1query* member can't include an order by or limit clause. 

## Examples
<a name="r_WITH_clause-examples"></a>

The following example shows the simplest possible case of a query that contains a WITH clause. The WITH query named VENUECOPY selects all of the rows from the VENUE table. The main query in turn selects all of the rows from VENUECOPY. The VENUECOPY table exists only for the duration of this query. 

```
with venuecopy as (select * from venue)
select * from venuecopy order by 1 limit 10;
```

```
 venueid |         venuename          |    venuecity    | venuestate | venueseats
---------+----------------------------+-----------------+------------+------------
1 | Toyota Park                | Bridgeview      | IL         |          0
2 | Columbus Crew Stadium      | Columbus        | OH         |          0
3 | RFK Stadium                | Washington      | DC         |          0
4 | CommunityAmerica Ballpark  | Kansas City     | KS         |          0
5 | Gillette Stadium           | Foxborough      | MA         |      68756
6 | New York Giants Stadium    | East Rutherford | NJ         |      80242
7 | BMO Field                  | Toronto         | ON         |          0
8 | The Home Depot Center      | Carson          | CA         |          0
9 | Dick's Sporting Goods Park | Commerce City   | CO         |          0
v     10 | Pizza Hut Park             | Frisco          | TX         |          0
(10 rows)
```

The following example shows a WITH clause that produces two tables, named VENUE\$1SALES and TOP\$1VENUES. The second WITH query table selects from the first. In turn, the WHERE clause of the main query block contains a subquery that constrains the TOP\$1VENUES table. 

```
with venue_sales as
(select venuename, venuecity, sum(pricepaid) as venuename_sales
from sales, venue, event
where venue.venueid=event.venueid and event.eventid=sales.eventid
group by venuename, venuecity),

top_venues as
(select venuename
from venue_sales
where venuename_sales > 800000)

select venuename, venuecity, venuestate,
sum(qtysold) as venue_qty,
sum(pricepaid) as venue_sales
from sales, venue, event
where venue.venueid=event.venueid and event.eventid=sales.eventid
and venuename in(select venuename from top_venues)
group by venuename, venuecity, venuestate
order by venuename;
```

```
        venuename       |   venuecity   | venuestate | venue_qty | venue_sales
------------------------+---------------+------------+-----------+-------------
August Wilson Theatre   | New York City | NY         |      3187 |  1032156.00
Biltmore Theatre        | New York City | NY         |      2629 |   828981.00
Charles Playhouse       | Boston        | MA         |      2502 |   857031.00
Ethel Barrymore Theatre | New York City | NY         |      2828 |   891172.00
Eugene O'Neill Theatre  | New York City | NY         |      2488 |   828950.00
Greek Theatre           | Los Angeles   | CA         |      2445 |   838918.00
Helen Hayes Theatre     | New York City | NY         |      2948 |   978765.00
Hilton Theatre          | New York City | NY         |      2999 |   885686.00
Imperial Theatre        | New York City | NY         |      2702 |   877993.00
Lunt-Fontanne Theatre   | New York City | NY         |      3326 |  1115182.00
Majestic Theatre        | New York City | NY         |      2549 |   894275.00
Nederlander Theatre     | New York City | NY         |      2934 |   936312.00
Pasadena Playhouse      | Pasadena      | CA         |      2739 |   820435.00
Winter Garden Theatre   | New York City | NY         |      2838 |   939257.00
(14 rows)
```

The following two examples demonstrate the rules for the scope of table references based on WITH clause subqueries. The first query runs, but the second fails with an expected error. The first query has WITH clause subquery inside the SELECT list of the main query. The table defined by the WITH clause (HOLIDAYS) is referenced in the FROM clause of the subquery in the SELECT list: 

```
select caldate, sum(pricepaid) as daysales,
(with holidays as (select * from date where holiday ='t')
select sum(pricepaid)
from sales join holidays on sales.dateid=holidays.dateid
where caldate='2008-12-25') as dec25sales
from sales join date on sales.dateid=date.dateid
where caldate in('2008-12-25','2008-12-31')
group by caldate
order by caldate;

caldate   | daysales | dec25sales
-----------+----------+------------
2008-12-25 | 70402.00 |   70402.00
2008-12-31 | 12678.00 |   70402.00
(2 rows)
```

The second query fails because it attempts to reference the HOLIDAYS table in the main query as well as in the SELECT list subquery. The main query references are out of scope. 

```
select caldate, sum(pricepaid) as daysales,
(with holidays as (select * from date where holiday ='t')
select sum(pricepaid)
from sales join holidays on sales.dateid=holidays.dateid
where caldate='2008-12-25') as dec25sales
from sales join holidays on sales.dateid=holidays.dateid
where caldate in('2008-12-25','2008-12-31')
group by caldate
order by caldate;

ERROR:  relation "holidays" does not exist
```

## Example: Recursive CTE
<a name="r_WITH_clause-recursive-cte-example"></a>

The following is an example of a recursive CTE that returns the employees who report directly or indirectly to John. The recursive query contains a WHERE clause to limit the depth of recursion to less than 4 levels.

```
--create and populate the sample table
  create table employee (
  id int,
  name varchar (20),
  manager_id int
  );
  
  insert into employee(id, name, manager_id)  values
(100, 'Carlos', null),
(101, 'John', 100),
(102, 'Jorge', 101),
(103, 'Kwaku', 101),
(110, 'Liu', 101),
(106, 'Mateo', 102),
(110, 'Nikki', 103),
(104, 'Paulo', 103),
(105, 'Richard', 103),
(120, 'Saanvi', 104),
(200, 'Shirley', 104),
(201, 'Sofía', 102),
(205, 'Zhang', 104);
  
--run the recursive query
  with recursive john_org(id, name, manager_id, level) as
( select id, name, manager_id, 1 as level
  from employee
  where name = 'John'
  union all
  select e.id, e.name, e.manager_id, level + 1 as next_level
  from employee e, john_org j
  where e.manager_id = j.id and level < 4
  )
 select distinct id, name, manager_id from john_org order by manager_id;
```

Following is the result of the query.

```
    id        name      manager_id
  ------+-----------+--------------
   101    John           100
   102    Jorge          101
   103    Kwaku          101
   110    Liu            101
   201    Sofía          102
   106    Mateo          102
   110    Nikki          103
   104    Paulo          103
   105    Richard        103
   120    Saanvi         104
   200    Shirley        104
   205    Zhang          104
```

Following is an organization chart for John's department.

![\[A diagram of an organization chart for John's department.\]](http://docs.aws.amazon.com/redshift/latest/dg/images/org-chart.png)


# SELECT list
<a name="r_SELECT_list"></a>

**Topics**
+ [Syntax](#r_SELECT_list-synopsis)
+ [Parameters](#r_SELECT_list-parameters)
+ [Usage notes](#r_SELECT_list_usage_notes)
+ [Examples](#r_SELECT_list-examples)

The SELECT list names the columns, functions, and expressions that you want the query to return. The list represents the output of the query. 

For more information about SQL functions, see [SQL functions reference](c_SQL_functions.md). For more information about expressions, see [Conditional expressions](c_conditional_expressions.md).

## Syntax
<a name="r_SELECT_list-synopsis"></a>

```
SELECT
[ TOP number ]
[ ALL | DISTINCT ] * | expression [ AS column_alias ] [, ...]
```

## Parameters
<a name="r_SELECT_list-parameters"></a>

TOP *number*   
TOP takes a positive integer as its argument, which defines the number of rows that are returned to the client. The behavior with the TOP clause is the same as the behavior with the LIMIT clause. The number of rows that is returned is fixed, but the set of rows isn't. To return a consistent set of rows, use TOP or LIMIT in conjunction with an ORDER BY clause. 

ALL   
A redundant keyword that defines the default behavior if you don't specify DISTINCT. `SELECT ALL *` means the same as `SELECT *` (select all rows for all columns and retain duplicates). 

DISTINCT   
Option that eliminates duplicate rows from the result set, based on matching values in one or more columns.   
If your application allows invalid foreign keys or primary keys, it can cause queries to return incorrect results. For example, a SELECT DISTINCT query might return duplicate rows if the primary key column doesn't contain all unique values. For more information, see [Defining table constraints](https://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html).

\$1 (asterisk)   
Returns the entire contents of the table (all columns and all rows). 

 *expression*   
An expression formed from one or more columns that exist in the tables referenced by the query. An expression can contain SQL functions. For example:   

```
avg(datediff(day, listtime, saletime))
```

AS *column\$1alias*   
A temporary name for the column that is used in the final result set. The AS keyword is optional. For example:   

```
avg(datediff(day, listtime, saletime)) as avgwait
```
If you don't specify an alias for an expression that isn't a simple column name, the result set applies a default name to that column.   
The alias is recognized right after it is defined in the target list. You can use an alias in other expressions defined after it in the same target list. The following example illustrates this.   

```
select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;
```
The benefit of the lateral alias reference is you don't need to repeat the aliased expression when building more complex expressions in the same target list. When Amazon Redshift parses this type of reference, it just inlines the previously defined aliases. If there is a column with the same name defined in the `FROM` clause as the previously aliased expression, the column in the `FROM` clause takes priority. For example, in the above query if there is a column named 'probability' in table raw\$1data, the 'probability' in the second expression in the target list refers to that column instead of the alias name 'probability'. 

## Usage notes
<a name="r_SELECT_list_usage_notes"></a>

TOP is a SQL extension; it provides an alternative to the LIMIT behavior. You can't use TOP and LIMIT in the same query.

## Examples
<a name="r_SELECT_list-examples"></a>

The following example returns 10 rows from the SALES table. Though the query uses the TOP clause, it still returns an unpredictable set of rows because no ORDER BY clause is specified,

```
select top 10 *
from sales;
```

The following query is functionally equivalent, but uses a LIMIT clause instead of a TOP clause:

```
select *
from sales
limit 10;
```

The following example returns the first 10 rows from the SALES table using the TOP clause, ordered by the QTYSOLD column in descending order.

```
select top 10 qtysold, sellerid
from sales
order by qtysold desc, sellerid;

qtysold | sellerid
--------+----------
8 |      518
8 |      520
8 |      574
8 |      718
8 |      868
8 |     2663
8 |     3396
8 |     3726
8 |     5250
8 |     6216
(10 rows)
```

The following example returns the first two QTYSOLD and SELLERID values from the SALES table, ordered by the QTYSOLD column:

```
select top 2 qtysold, sellerid
from sales
order by qtysold desc, sellerid;

qtysold | sellerid
--------+----------
8 |      518
8 |      520
(2 rows)
```

The following example shows the list of distinct category groups from the CATEGORY table:

```
select distinct catgroup from category
order by 1;

catgroup
----------
Concerts
Shows
Sports
(3 rows)

--the same query, run without distinct
select catgroup from category
order by 1;

catgroup
----------
Concerts
Concerts
Concerts
Shows
Shows
Shows
Sports
Sports
Sports
Sports
Sports
(11 rows)
```

The following example returns the distinct set of week numbers for December 2008. Without the DISTINCT clause, the statement would return 31 rows, or one for each day of the month.

```
select distinct week, month, year
from date
where month='DEC' and year=2008
order by 1, 2, 3;

week | month | year
-----+-------+------
49 | DEC   | 2008
50 | DEC   | 2008
51 | DEC   | 2008
52 | DEC   | 2008
53 | DEC   | 2008
(5 rows)
```


# EXCLUDE column\$1list
<a name="r_EXCLUDE_list"></a>

The EXCLUDE column\$1list names the columns that are excluded from the query results. Using the EXCLUDE option is helpful when only a subset of columns need to be excluded from a *wide* table, which is a table that contains many columns. 

**Topics**
+ [Syntax](#r_EXCLUDE_list-synopsis)
+ [Parameters](#r_EXCLUDE_list-parameters)
+ [Examples](#r_EXCLUDE_list-examples)

## Syntax
<a name="r_EXCLUDE_list-synopsis"></a>

```
EXCLUDE column_list
```

## Parameters
<a name="r_EXCLUDE_list-parameters"></a>

 *column\$1list*   
A comma-separated list of one or more column names that exist in the tables referenced by the query. The *column\$1list* can optionally be enclosed in parentheses. Only column names are supported in the exclude list of column names, not expressions (such as `upper(col1)`) or asterisk (\$1).  

```
column-name, ... | ( column-name, ... )
```
For example:   

```
SELECT * EXCLUDE col1, col2 FROM tablea;
```

```
SELECT * EXCLUDE (col1, col2) FROM tablea;
```

## Examples
<a name="r_EXCLUDE_list-examples"></a>

The following examples use the SALES table that contains columns: salesid, listid, sellerid, buyerid, eventid, dateid, qtysold, pricepaid, commission, and saletime. For more information about the SALES table, see [Sample database](c_sampledb.md).

The following example returns rows from the SALES table, but excludes the SALETIME column.

```
SELECT * EXCLUDE saletime FROM sales;

salesid | listid  | sellerid | buyerid | eventid | dateid  | qtysold  | pricepaid  | commission
--------+---------+----------+---------+---------+---------+----------+------------+-----------
150314  | 173969  | 48680    | 816     | 8762    | 1827    | 2        | 688        | 103.2	
8325    | 8942    | 23600    | 1078    | 2557    | 1828    | 5        | 525        |  78.75	
46807   | 52711   | 34388    | 1047    | 2046    | 1828    | 2        | 482        |  72.3	
...
```

The following example returns rows from the SALES table, but excludes the QTYSOLD and SALETIME columns.

```
SELECT * EXCLUDE (qtysold, saletime) FROM sales;

salesid | listid  | sellerid | buyerid | eventid | dateid  | pricepaid  | commission
--------+---------+----------+---------+---------+---------+------------+-----------
150314  | 173969  | 48680    | 816     | 8762    | 1827    | 688        | 103.2	
8325    | 8942    | 23600    | 1078    | 2557    | 1828    | 525        |  78.75	
46807   | 52711   | 34388    | 1047    | 2046    | 1828    | 482        |  72.3	
...
```

The following example creates a view that returns rows from the SALES table, but excludes the SALETIME column.

```
CREATE VIEW sales_view AS SELECT * EXCLUDE saletime FROM sales;
SELECT * FROM sales_view;

salesid | listid  | sellerid | buyerid | eventid | dateid  | qtysold  | pricepaid  | commission
--------+---------+----------+---------+---------+---------+----------+------------+-----------
150314  | 173969  | 48680    | 816     | 8762    | 1827    | 2        | 688        | 103.2	
8325    | 8942    | 23600    | 1078    | 2557    | 1828    | 5        | 525        |  78.75	
46807   | 52711   | 34388    | 1047    | 2046    | 1828    | 2        | 482        |  72.3	
...
```

The following example selects only the columns that are not excluded into a temp table.

```
SELECT * EXCLUDE saletime INTO TEMP temp_sales FROM sales;
SELECT * FROM temp_sales;

salesid | listid  | sellerid | buyerid | eventid | dateid  | qtysold  | pricepaid  | commission
--------+---------+----------+---------+---------+---------+----------+------------+-----------
150314  | 173969  | 48680    | 816     | 8762    | 1827    | 2        | 688        | 103.2	
8325    | 8942    | 23600    | 1078    | 2557    | 1828    | 5        | 525        |  78.75	
46807   | 52711   | 34388    | 1047    | 2046    | 1828    | 2        | 482        |  72.3	
...
```

# FROM clause
<a name="r_FROM_clause30"></a>

The FROM clause in a query lists the table references (tables, views, and subqueries) that data is selected from. If multiple table references are listed, the tables must be joined, using appropriate syntax in either the FROM clause or the WHERE clause. If no join criteria are specified, the system processes the query as a cross-join (Cartesian product). 

**Topics**
+ [Syntax](#r_FROM_clause30-synopsis)
+ [Parameters](#r_FROM_clause30-parameters)
+ [Usage notes](#r_FROM_clause_usage_notes)
+ [PIVOT and UNPIVOT examples](r_FROM_clause-pivot-unpivot-examples.md)
+ [JOIN examples](r_Join_examples.md)
+ [UNNEST examples](r_FROM_clause-unnest-examples.md)

## Syntax
<a name="r_FROM_clause30-synopsis"></a>

```
FROM table_reference [, ...]
```

where *table\$1reference* is one of the following: 

```
with_subquery_table_name [ table_alias ]
table_name [ * ] [ table_alias ]
( subquery ) [ table_alias ]
table_reference [ NATURAL ] join_type table_reference
   [ ON join_condition | USING ( join_column [, ...] ) ]
table_reference  join_type super_expression 
   [ ON join_condition ]
table_reference PIVOT ( 
   aggregate(expr) [ [ AS ] aggregate_alias ]
   FOR column_name IN ( expression [ AS ] in_alias [, ...] )
) [ table_alias ]
table_reference UNPIVOT [ INCLUDE NULLS | EXCLUDE NULLS ] ( 
   value_column_name 
   FOR name_column_name IN ( column_reference [ [ AS ]
   in_alias ] [, ...] )
) [ table_alias ]
UNPIVOT expression AS value_alias [ AT attribute_alias ]
( super_expression.attribute_name ) AS value_alias [ AT index_alias ]
UNNEST ( column_reference )
  [AS] table_alias ( unnested_column_name )
UNNEST ( column_reference ) WITH OFFSET
  [AS] table_alias ( unnested_column_name, [offset_column_name] )
```

The optional *table\$1alias* can be used to give temporary names to tables and complex table references and, if desired, their columns as well, like the following: 

```
[ AS ] alias [ ( column_alias [, ...] ) ]
```

## Parameters
<a name="r_FROM_clause30-parameters"></a>

 *with\$1subquery\$1table\$1name*   
A table defined by a subquery in the [WITH clause](r_WITH_clause.md). 

 *table\$1name*   
Name of a table or view. 

 *alias*   
Temporary alternative name for a table or view. An alias must be supplied for a table derived from a subquery. In other table references, aliases are optional. The AS keyword is always optional. Table aliases provide a convenient shortcut for identifying tables in other parts of a query, such as the WHERE clause. For example:   

```
select * from sales s, listing l
where s.listid=l.listid
```

 *column\$1alias*   
Temporary alternative name for a column in a table or view. 

 *subquery*   
A query expression that evaluates to a table. The table exists only for the duration of the query and is typically given a name or *alias*. However, an alias isn't required. You can also define column names for tables that derive from subqueries. Naming column aliases is important when you want to join the results of subqueries to other tables and when you want to select or constrain those columns elsewhere in the query.   
A subquery may contain an ORDER BY clause, but this clause may have no effect if a LIMIT or OFFSET clause isn't also specified. 

NATURAL   
Defines a join that automatically uses all pairs of identically named columns in the two tables as the joining columns. No explicit join condition is required. For example, if the CATEGORY and EVENT tables both have columns named CATID, a natural join of those tables is a join over their CATID columns.   
If a NATURAL join is specified but no identically named pairs of columns exist in the tables to be joined, the query defaults to a cross-join. 

 *join\$1type*   
Specify one of the following types of join:   
+ [INNER] JOIN 
+ LEFT [OUTER] JOIN 
+ RIGHT [OUTER] JOIN 
+ FULL [OUTER] JOIN 
+ CROSS JOIN 
Cross-joins are unqualified joins; they return the Cartesian product of the two tables.   
Inner and outer joins are qualified joins. They are qualified either implicitly (in natural joins); with the ON or USING syntax in the FROM clause; or with a WHERE clause condition.   
An inner join returns matching rows only, based on the join condition or list of joining columns. An outer join returns all of the rows that the equivalent inner join would return plus non-matching rows from the "left" table, "right" table, or both tables. The left table is the first-listed table, and the right table is the second-listed table. The non-matching rows contain NULL values to fill the gaps in the output columns. 

ON *join\$1condition*   
Type of join specification where the joining columns are stated as a condition that follows the ON keyword. For example:   

```
sales join listing
on sales.listid=listing.listid and sales.eventid=listing.eventid
```

USING ( *join\$1column* [, ...] )   
Type of join specification where the joining columns are listed in parentheses. If multiple joining columns are specified, they are delimited by commas. The USING keyword must precede the list. For example:   

```
sales join listing
using (listid,eventid)
```

PIVOT  
Rotates output from rows to columns, for the purpose of representing tabular data in a format that is easy to read. Output is represented horizontally across multiple columns. PIVOT is similar to a GROUP BY query with an aggregation, using an aggregate expression to specify an output format. However, in contrast to GROUP BY, the results are returned in columns instead of rows.  
For examples that show how to query with PIVOT and UNPIVOT, see [PIVOT and UNPIVOT examples](r_FROM_clause-pivot-unpivot-examples.md).

UNPIVOT  
*Rotating columns into rows with UNPIVOT* – The operator transforms result columns, from an input table or query results, into rows, to make the output easier to read. UNPIVOT combines the data of its input columns into two result columns: a name column and a value column. The name column contains column names from the input, as row entries. The value column contains values from the input columns, such as results of an aggregation. For example, the counts of items in various categories.  
*Object unpivoting with UNPIVOT (SUPER)* – You can perform object unpivoting, where *expression* is a SUPER expression referring to another FROM clause item. For more information, see [Object unpivoting](query-super.md#unpivoting). It also has examples that show how to query semi-structured data, such as data that's JSON-formatted.

*super\$1expression*  
A valid SUPER expression. Amazon Redshift returns one row for each value in the specified attribute. For more information on the SUPER data type, see [SUPER type](r_SUPER_type.md). For more information about unnested SUPER values, see [Unnesting queries](query-super.md#unnest).

*attribute\$1name*  
The name of an attribute in the SUPER expression.

*index\$1alias*  
Alias for the index that signifies the value's position in the SUPER expression.

UNNEST  
Expands a nested structure, typically a SUPER array, into columns containing the unnested elements. For more information on unnesting SUPER data, see [Querying semi-structured data](query-super.md). For examples, see [UNNEST examples](r_FROM_clause-unnest-examples.md). 

*unnested\$1column\$1name*  
The name of the column that contains the unnested elements. 

UNNEST ... WITH OFFSET  
Adds an offset column to the unnested output, with the offset representing the zero-based index of each element in the array. This variant is useful when you want to see the position of elements within an array. For more information on unnesting SUPER data, see [Querying semi-structured data](query-super.md). For examples, see [UNNEST examples](r_FROM_clause-unnest-examples.md). 

*offset\$1column\$1name*  
A custom name for the offset column that lets you explicitly define how the index column will appear in the output. This parameter is optional. By default, the offset column name is `offset_col`. 

## Usage notes
<a name="r_FROM_clause_usage_notes"></a>

Joining columns must have comparable data types. 

A NATURAL or USING join retains only one of each pair of joining columns in the intermediate result set. 

A join with the ON syntax retains both joining columns in its intermediate result set. 

See also [WITH clause](r_WITH_clause.md). 

# PIVOT and UNPIVOT examples
<a name="r_FROM_clause-pivot-unpivot-examples"></a>

PIVOT and UNPIVOT are parameters in the FROM clause that rotate query output from rows to columns and columns to rows, respectively. They represent tabular query results in a format that's easy to read. The following examples use test data and queries to show how to use them.

For more information about these and other parameters, see [FROM clause](https://docs.aws.amazon.com/redshift/latest/dg/r_FROM_clause30.html).

## PIVOT examples
<a name="r_FROM_clause-pivot-examples"></a>

Set up the sample table and data and use them to run the subsequent example queries.

```
CREATE TABLE part (
    partname varchar,
    manufacturer varchar,
    quality int,
    price decimal(12, 2)
);

INSERT INTO part VALUES ('prop', 'local parts co', 2, 10.00);
INSERT INTO part VALUES ('prop', 'big parts co', NULL, 9.00);
INSERT INTO part VALUES ('prop', 'small parts co', 1, 12.00);

INSERT INTO part VALUES ('rudder', 'local parts co', 1, 2.50);
INSERT INTO part VALUES ('rudder', 'big parts co', 2, 3.75);
INSERT INTO part VALUES ('rudder', 'small parts co', NULL, 1.90);

INSERT INTO part VALUES ('wing', 'local parts co', NULL, 7.50);
INSERT INTO part VALUES ('wing', 'big parts co', 1, 15.20);
INSERT INTO part VALUES ('wing', 'small parts co', NULL, 11.80);
```

PIVOT on `partname` with an `AVG` aggregation on `price`.

```
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
    AVG(price) FOR partname IN ('prop', 'rudder', 'wing')
);
```

The query results in the following output.

```
  prop   |  rudder  |  wing
---------+----------+---------
 10.33   | 2.71     | 11.50
```

In the previous example, the results are transformed into columns. The following example shows a `GROUP BY` query that returns the average prices in rows, rather than in columns.

```
SELECT partname, avg(price)
FROM (SELECT partname, price FROM part)
WHERE partname IN ('prop', 'rudder', 'wing')
GROUP BY partname;
```

The query results in the following output.

```
 partname |  avg
----------+-------
 prop     | 10.33
 rudder   |  2.71
 wing     | 11.50
```

A `PIVOT` example with `manufacturer` as an implicit column.

```
SELECT *
FROM (SELECT quality, manufacturer FROM part) PIVOT (
    count(*) FOR quality IN (1, 2, NULL)
);
```

The query results in the following output.

```
 manufacturer      | 1  | 2  | null
-------------------+----+----+------
 local parts co    | 1  | 1  |  1
 big parts co      | 1  | 1  |  1
 small parts co    | 1  | 0  |  2
```

 Input table columns that are not referenced in the `PIVOT` definition are added implicitly to the result table. This is the case for the `manufacturer` column in the previous example. The example also shows that `NULL` is a valid value for the `IN` operator. 

`PIVOT` in the above example returns similar information as the following query, which includes `GROUP BY`. The difference is that `PIVOT` returns the value `0` for column `2` and the manufacturer `small parts co`. The `GROUP BY` query does not contain a corresponding row. In most cases, `PIVOT` inserts `NULL` if a row doesn't have input data for a given column. However, the count aggregate doesn't return `NULL` and `0` is the default value.

```
SELECT manufacturer, quality, count(*)
FROM (SELECT quality, manufacturer FROM part)
WHERE quality IN (1, 2) OR quality IS NULL
GROUP BY manufacturer, quality
ORDER BY manufacturer;
```

The query results in the following output.

```
 manufacturer        | quality | count
---------------------+---------+-------
 big parts co        |         |     1
 big parts co        |       2 |     1
 big parts co        |       1 |     1
 local parts co      |       2 |     1
 local parts co      |       1 |     1
 local parts co      |         |     1
 small parts co      |       1 |     1
 small parts co      |         |     2
```

 The PIVOT operator accepts optional aliases on the aggregate expression and on each value for the `IN` operator. Use aliases to customize the column names. If there is no aggregate alias, only the `IN` list aliases are used. Otherwise, the aggregate alias is appended to the column name with an underscore to separate the names. 

```
SELECT *
FROM (SELECT quality, manufacturer FROM part) PIVOT (
    count(*) AS count FOR quality IN (1 AS high, 2 AS low, NULL AS na)
);
```

The query results in the following output.

```
 manufacturer      | high_count  | low_count | na_count
-------------------+-------------+-----------+----------
 local parts co    |           1 |         1 |        1
 big parts co      |           1 |         1 |        1
 small parts co    |           1 |         0 |        2
```

Set up the following sample table and data and use them to run the subsequent example queries. The data represents booking dates for a collection of hotels.

```
CREATE TABLE bookings (
    booking_id int,
    hotel_code char(8),
    booking_date date,
    price decimal(12, 2)
);

INSERT INTO bookings VALUES (1, 'FOREST_L', '02/01/2023', 75.12);
INSERT INTO bookings VALUES (2, 'FOREST_L', '02/02/2023', 75.00);
INSERT INTO bookings VALUES (3, 'FOREST_L', '02/04/2023', 85.54);

INSERT INTO bookings VALUES (4, 'FOREST_L', '02/08/2023', 75.00);
INSERT INTO bookings VALUES (5, 'FOREST_L', '02/11/2023', 75.00);
INSERT INTO bookings VALUES (6, 'FOREST_L', '02/14/2023', 90.00);

INSERT INTO bookings VALUES (7, 'FOREST_L', '02/21/2023', 60.00);
INSERT INTO bookings VALUES (8, 'FOREST_L', '02/22/2023', 85.00);
INSERT INTO bookings VALUES (9, 'FOREST_L', '02/27/2023', 90.00);

INSERT INTO bookings VALUES (10, 'DESERT_S', '02/01/2023', 98.00);
INSERT INTO bookings VALUES (11, 'DESERT_S', '02/02/2023', 75.00);
INSERT INTO bookings VALUES (12, 'DESERT_S', '02/04/2023', 85.00);

INSERT INTO bookings VALUES (13, 'DESERT_S', '02/05/2023', 75.00);
INSERT INTO bookings VALUES (14, 'DESERT_S', '02/06/2023', 34.00);
INSERT INTO bookings VALUES (15, 'DESERT_S', '02/09/2023', 85.00);

INSERT INTO bookings VALUES (16, 'DESERT_S', '02/12/2023', 23.00);
INSERT INTO bookings VALUES (17, 'DESERT_S', '02/13/2023', 76.00);
INSERT INTO bookings VALUES (18, 'DESERT_S', '02/14/2023', 85.00);

INSERT INTO bookings VALUES (19, 'OCEAN_WV', '02/01/2023', 98.00);
INSERT INTO bookings VALUES (20, 'OCEAN_WV', '02/02/2023', 75.00);
INSERT INTO bookings VALUES (21, 'OCEAN_WV', '02/04/2023', 85.00);

INSERT INTO bookings VALUES (22, 'OCEAN_WV', '02/06/2023', 75.00);
INSERT INTO bookings VALUES (23, 'OCEAN_WV', '02/09/2023', 34.00);
INSERT INTO bookings VALUES (24, 'OCEAN_WV', '02/12/2023', 85.00);

INSERT INTO bookings VALUES (25, 'OCEAN_WV', '02/13/2023', 23.00);
INSERT INTO bookings VALUES (26, 'OCEAN_WV', '02/14/2023', 76.00);
INSERT INTO bookings VALUES (27, 'OCEAN_WV', '02/16/2023', 85.00);

INSERT INTO bookings VALUES (28, 'CITY_BLD', '02/01/2023', 98.00);
INSERT INTO bookings VALUES (29, 'CITY_BLD', '02/02/2023', 75.00);
INSERT INTO bookings VALUES (30, 'CITY_BLD', '02/04/2023', 85.00);

INSERT INTO bookings VALUES (31, 'CITY_BLD', '02/12/2023', 75.00);
INSERT INTO bookings VALUES (32, 'CITY_BLD', '02/13/2023', 34.00);
INSERT INTO bookings VALUES (33, 'CITY_BLD', '02/17/2023', 85.00);

INSERT INTO bookings VALUES (34, 'CITY_BLD', '02/22/2023', 23.00);
INSERT INTO bookings VALUES (35, 'CITY_BLD', '02/23/2023', 76.00);
INSERT INTO bookings VALUES (36, 'CITY_BLD', '02/24/2023', 85.00);
```

 In this sample query, booking records are tallied to give a total for each week. The end date for each week becomes a column name.

```
SELECT * FROM
    (SELECT
       booking_id,
       (date_trunc('week', booking_date::date) + '5 days'::interval)::date as enddate,
       hotel_code AS "hotel code"
FROM bookings
) PIVOT (
    count(booking_id) FOR enddate IN ('2023-02-04','2023-02-11','2023-02-18') 
);
```

The query results in the following output.

```
 hotel code | 2023-02-04  | 2023-02-11 | 2023-02-18
------------+-------------+------------+----------
 FOREST_L   |           3 |          2 |        1
 DESERT_S   |           4 |          3 |        2
 OCEAN_WV   |           3 |          3 |        3
 CITY_BLD   |           3 |          1 |        2
```

 Amazon Redshift doesn't support CROSSTAB to pivot on multiple columns. But you can change row data to columns, in a similar manner to an aggregation with PIVOT, with a query like the following. This uses the same booking sample data as the previous example.

```
SELECT 
  booking_date,
  MAX(CASE WHEN hotel_code = 'FOREST_L' THEN 'forest is booked' ELSE '' END) AS FOREST_L,
  MAX(CASE WHEN hotel_code = 'DESERT_S' THEN 'desert is booked' ELSE '' END) AS DESERT_S,
  MAX(CASE WHEN hotel_code = 'OCEAN_WV' THEN 'ocean is booked' ELSE '' END)  AS OCEAN_WV
FROM bookings
GROUP BY booking_date
ORDER BY booking_date asc;
```

The sample query results in booking dates listed next to short phrases that indicate which hotels are booked.

```
 booking_date  | forest_l         | desert_s         | ocean_wv
---------------+------------------+------------------+--------------------
 2023-02-01    | forest is booked | desert is booked |  ocean is booked
 2023-02-02    | forest is booked | desert is booked |  ocean is booked
 2023-02-04    | forest is booked | desert is booked |  ocean is booked
 2023-02-05    |                  | desert is booked |        
 2023-02-06    |                  | desert is booked |
```

The following are usage notes for `PIVOT`:
+ `PIVOT` can be applied to tables, sub-queries, and common table expressions (CTEs). `PIVOT` cannot be applied to any `JOIN` expressions, recursive CTEs, `PIVOT`, or `UNPIVOT` expressions. Also not supported are `SUPER` unnested expressions and Redshift Spectrum nested tables.
+  `PIVOT` supports the `COUNT`, `SUM`, `MIN`, `MAX`, and `AVG` aggregate functions. 
+ The `PIVOT` aggregate expression has to be a call of a supported aggregate function. Complex expressions on top of the aggregate are not supported. The aggregate arguments cannot contain references to tables other than the `PIVOT` input table. Correlated references to a parent query are also not supported. The aggregate argument may contain sub-queries. These can be correlated internally or on the `PIVOT` input table.
+  The `PIVOT IN` list values cannot be column references or sub-queries. Each value must be type compatible with the `FOR` column reference. 
+  If the `IN` list values do not have aliases, `PIVOT` generates default column names. For constant `IN` values such as ‘abc’ or 5 the default column name is the constant itself. For any complex expression, the column name is a standard Amazon Redshift default name such as `?column?`. 

## UNPIVOT examples
<a name="r_FROM_clause-unpivot-examples"></a>

Set up the sample data and use it to run the subsequent examples.

```
CREATE TABLE count_by_color (quality varchar, red int, green int, blue int);

INSERT INTO count_by_color VALUES ('high', 15, 20, 7);
INSERT INTO count_by_color VALUES ('normal', 35, NULL, 40);
INSERT INTO count_by_color VALUES ('low', 10, 23, NULL);
```

`UNPIVOT` on input columns red, green, and blue.

```
SELECT *
FROM (SELECT red, green, blue FROM count_by_color) UNPIVOT (
    cnt FOR color IN (red, green, blue)
);
```

The query results in the following output.

```
 color | cnt
-------+-----
 red   |  15
 red   |  35
 red   |  10
 green |  20
 green |  23
 blue  |   7
 blue  |  40
```

By default, `NULL` values in the input column are skipped and do not yield a result row. 

The following example shows `UNPIVOT` with `INCLUDE NULLS`.

```
SELECT *
FROM (
    SELECT red, green, blue
    FROM count_by_color
) UNPIVOT INCLUDE NULLS (
    cnt FOR color IN (red, green, blue)
);
```

The following is the resulting output.

```
 color | cnt
-------+-----
 red   |  15
 red   |  35
 red   |  10
 green |  20
 green |
 green |  23
 blue  |   7
 blue  |  40
 blue  |
```

If the `INCLUDING NULLS` parameter is set, `NULL` input values generate result rows.

`The following query shows UNPIVOT` with `quality` as an implicit column.

```
SELECT *
FROM count_by_color UNPIVOT (
    cnt FOR color IN (red, green, blue)
);
```

The query results in the following output.

```
 quality | color | cnt
---------+-------+-----
 high    | red   |  15
 normal  | red   |  35
 low     | red   |  10
 high    | green |  20
 low     | green |  23
 high    | blue  |   7
 normal  | blue  |  40
```

Columns of the input table that are not referenced in the `UNPIVOT` definition are added implicitly to the result table. In the example, this is the case for the `quality` column.

The following example shows `UNPIVOT` with aliases for values in the `IN` list.

```
SELECT *
FROM count_by_color UNPIVOT (
    cnt FOR color IN (red AS r, green AS g, blue AS b)
);
```

The previous query results in the following output.

```
 quality | color | cnt
---------+-------+-----
 high    | r     |  15
 normal  | r     |  35
 low     | r     |  10
 high    | g     |  20
 low     | g     |  23
 high    | b     |   7
 normal  | b     |  40
```

The `UNPIVOT` operator accepts optional aliases on each `IN` list value. Each alias provides customization of the data in each `value` column.

The following are usage notes for `UNPIVOT`.
+ `UNPIVOT` can be applied to tables, sub-queries, and common table expressions (CTEs). `UNPIVOT` cannot be applied to any `JOIN` expressions, recursive CTEs, `PIVOT`, or `UNPIVOT` expressions. Also not supported are `SUPER` unnested expressions and Redshift Spectrum nested tables.
+ The `UNPIVOT IN` list must contain only input table column references. The `IN` list columns must have a common type that they are all compatible with. The `UNPIVOT` value column has this common type. The `UNPIVOT` name column is of type `VARCHAR`.
+ If an `IN` list value does not have an alias, `UNPIVOT` uses the column name as a default value.

# JOIN examples
<a name="r_Join_examples"></a>

A SQL JOIN clause is used to combine the data from two or more tables based on common fields. The results might or might not change depending on the join method specified. For more information about the syntax of a JOIN clause, see [Parameters](r_FROM_clause30.md#r_FROM_clause30-parameters). 

The following examples use data from the `TICKIT` sample data. For more information about the database schema, see [Sample database](c_sampledb.md). To learn how to load sample data, see [Loading data](https://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-create-sample-db.html) in the *Amazon Redshift Getting Started Guide*.

The following query is an inner join (without the JOIN keyword) between the LISTING table and SALES table, where the LISTID from the LISTING table is between 1 and 5. This query matches LISTID column values in the LISTING table (the left table) and SALES table (the right table). The results show that LISTID 1, 4, and 5 match the criteria.

```
select listing.listid, sum(pricepaid) as price, sum(commission) as comm
from listing, sales
where listing.listid = sales.listid
and listing.listid between 1 and 5
group by 1
order by 1;

listid | price  |  comm
-------+--------+--------
     1 | 728.00 | 109.20
     4 |  76.00 |  11.40
     5 | 525.00 |  78.75
```

The following query is a left outer join. Left and right outer joins retain values from one of the joined tables when no match is found in the other table. The left and right tables are the first and second tables listed in the syntax. NULL values are used to fill the "gaps" in the result set. This query matches LISTID column values in the LISTING table (the left table) and the SALES table (the right table). The results show that LISTIDs 2 and 3 did not result in any sales.

```
select listing.listid, sum(pricepaid) as price, sum(commission) as comm
from listing left outer join sales on sales.listid = listing.listid
where listing.listid between 1 and 5
group by 1
order by 1;

listid | price  |  comm
-------+--------+--------
     1 | 728.00 | 109.20
     2 | NULL   | NULL
     3 | NULL   | NULL
     4 |  76.00 |  11.40
     5 | 525.00 |  78.75
```

The following query is a right outer join. This query matches LISTID column values in the LISTING table (the left table) and the SALES table (the right table). The results show that LISTIDs 1, 4, and 5 match the criteria.

```
select listing.listid, sum(pricepaid) as price, sum(commission) as comm
from listing right outer join sales on sales.listid = listing.listid
where listing.listid between 1 and 5
group by 1
order by 1;

listid | price  |  comm
-------+--------+--------
     1 | 728.00 | 109.20
     4 |  76.00 |  11.40
     5 | 525.00 |  78.75
```

The following query is a full join. Full joins retain values from the joined tables when no match is found in the other table. The left and right tables are the first and second tables listed in the syntax. NULL values are used to fill the "gaps" in the result set. This query matches LISTID column values in the LISTING table (the left table) and the SALES table (the right table). The results show that LISTIDs 2 and 3 did not result in any sales.

```
select listing.listid, sum(pricepaid) as price, sum(commission) as comm
from listing full join sales on sales.listid = listing.listid
where listing.listid between 1 and 5
group by 1
order by 1;

listid | price  |  comm
-------+--------+--------
     1 | 728.00 | 109.20
     2 | NULL   | NULL
     3 | NULL   | NULL
     4 |  76.00 |  11.40
     5 | 525.00 |  78.75
```

The following query is a full join. This query matches LISTID column values in the LISTING table (the left table) and the SALES table (the right table). Only rows that do not result in any sales (LISTIDs 2 and 3) are in the results.

```
select listing.listid, sum(pricepaid) as price, sum(commission) as comm
from listing full join sales on sales.listid = listing.listid
where listing.listid between 1 and 5
and (listing.listid IS NULL or sales.listid IS NULL)
group by 1
order by 1;

listid | price  |  comm
-------+--------+--------
     2 | NULL   | NULL
     3 | NULL   | NULL
```

The following example is an inner join with the ON clause. In this case, NULL rows are not returned.

```
select listing.listid, sum(pricepaid) as price, sum(commission) as comm
from sales join listing
on sales.listid=listing.listid and sales.eventid=listing.eventid
where listing.listid between 1 and 5
group by 1
order by 1;

listid | price  |  comm
-------+--------+--------
     1 | 728.00 | 109.20
     4 |  76.00 |  11.40
     5 | 525.00 |  78.75
```

The following query is a cross join or Cartesian join of the LISTING table and the SALES table with a predicate to limit the results. This query matches LISTID column values in the SALES table and the LISTING table for LISTIDs 1, 2, 3, 4, and 5 in both tables. The results show that 20 rows match the criteria.

```
select sales.listid as sales_listid, listing.listid as listing_listid
from sales cross join listing
where sales.listid between 1 and 5
and listing.listid between 1 and 5
order by 1,2;

sales_listid | listing_listid
-------------+---------------
1            | 1
1            | 2
1            | 3
1            | 4
1            | 5
4            | 1
4            | 2
4            | 3
4            | 4
4            | 5
5            | 1
5            | 1
5            | 2
5            | 2
5            | 3
5            | 3
5            | 4
5            | 4
5            | 5
5            | 5
```

The following example is a natural join between two tables. In this case, the columns listid, sellerid, eventid, and dateid have identical names and data types in both tables and so are used as the join columns. The results are limited to five rows.

```
select listid, sellerid, eventid, dateid, numtickets
from listing natural join sales
order by 1
limit 5;

listid | sellerid  | eventid | dateid | numtickets
-------+-----------+---------+--------+-----------
113    | 29704     | 4699    | 2075   | 22
115    | 39115     | 3513    | 2062   | 14
116    | 43314     | 8675    | 1910   | 28
118    | 6079      | 1611    | 1862   | 9
163    | 24880     | 8253    | 1888   | 14
```

The following example is a join between two tables with the USING clause. In this case, the columns listid and eventid are used as the join columns. The results are limited to five rows.

```
select listid, listing.sellerid, eventid, listing.dateid, numtickets
from listing join sales
using (listid, eventid)
order by 1
limit 5;

listid | sellerid | eventid | dateid | numtickets
-------+----------+---------+--------+-----------
1      | 36861    | 7872    | 1850   | 10
4      | 8117     | 4337    | 1970   | 8
5      | 1616     | 8647    | 1963   | 4
5      | 1616     | 8647    | 1963   | 4
6      | 47402    | 8240    | 2053   | 18
```

The following query is an inner join of two subqueries in the FROM clause. The query finds the number of sold and unsold tickets for different categories of events (concerts and shows). The FROM clause subqueries are *table* subqueries; they can return multiple columns and rows.

```
select catgroup1, sold, unsold
from
(select catgroup, sum(qtysold) as sold
from category c, event e, sales s
where c.catid = e.catid and e.eventid = s.eventid
group by catgroup) as a(catgroup1, sold)
join
(select catgroup, sum(numtickets)-sum(qtysold) as unsold
from category c, event e, sales s, listing l
where c.catid = e.catid and e.eventid = s.eventid
and s.listid = l.listid
group by catgroup) as b(catgroup2, unsold)

on a.catgroup1 = b.catgroup2
order by 1;

catgroup1 |  sold  | unsold
----------+--------+--------
Concerts  | 195444 |1067199
Shows     | 149905 | 817736
```

# UNNEST examples
<a name="r_FROM_clause-unnest-examples"></a>

UNNEST is a parameter in the FROM clause that expands nested data into columns that hold the data’s unnested elements. For information on unnesting data, see [Querying semi-structured data](query-super.md).

The following statement creates and populates the `orders` table, which contains a `products` column containing arrays of product IDs. The examples in this section use the sample data in this table. 

```
CREATE TABLE orders (
    order_id INT,
    products SUPER
);

-- Populate table
INSERT INTO orders VALUES
(1001, JSON_PARSE('[
        {
            "product_id": "P456",
            "name": "Monitor",
            "price": 299.99,
            "quantity": 1,
            "specs": {
                "size": "27 inch",
                "resolution": "4K"
            }
        }
    ]
')),
(1002, JSON_PARSE('
    [
        {
            "product_id": "P567",
            "name": "USB Cable",
            "price": 9.99,
            "quantity": 3
        },
        {
            "product_id": "P678",
            "name": "Headphones",
            "price": 159.99,
            "quantity": 1,
            "specs": {
                "type": "Wireless",
                "battery_life": "20 hours"
            }
        }
    ]
'));
```

Following are some examples of unnesting queries with the sample data using PartiQL syntax.

## Unnesting an array without an OFFSET column
<a name="r_FROM_clause-unnest-examples-no-offset"></a>

The following query unnests the SUPER arrays in the products column, with each row representing an item from the order in `order_id`.

```
SELECT o.order_id, unnested_products.product
FROM orders o, UNNEST(o.products) AS unnested_products(product);

 order_id |                                                           product                                                           
----------+-----------------------------------------------------------------------------------------------------------------------------
     1001 | {"product_id":"P456","name":"Monitor","price":299.99,"quantity":1,"specs":{"size":"27 inch","resolution":"4K"}}
     1002 | {"product_id":"P567","name":"USB Cable","price":9.99,"quantity":3}
     1002 | {"product_id":"P678","name":"Headphones","price":159.99,"quantity":1,"specs":{"type":"Wireless","battery_life":"20 hours"}}
(3 rows)
```

The following query finds the most expensive product in each order.

```
SELECT o.order_id, MAX(unnested_products.product)
FROM orders o, UNNEST(o.products) AS unnested_products(product);

 order_id |                                                           product                                                           
----------+-----------------------------------------------------------------------------------------------------------------------------
     1001 | {"product_id":"P456","name":"Monitor","price":299.99,"quantity":1,"specs":{"size":"27 inch","resolution":"4K"}}
     1002 | {"product_id":"P678","name":"Headphones","price":159.99,"quantity":1,"specs":{"type":"Wireless","battery_life":"20 hours"}}
(2 rows)
```

## Unnesting an array with an implicit OFFSET column
<a name="r_FROM_clause-unnest-examples-implicit-offset"></a>

The following query uses the `UNNEST ... WITH OFFSET` parameter to show the zero-based position of each product within its order array.

```
SELECT o.order_id, up.product, up.offset_col
FROM orders o, UNNEST(o.products) WITH OFFSET AS up(product);

 order_id |                                                           product                                                           | offset_col 
----------+-----------------------------------------------------------------------------------------------------------------------------+------------
     1001 | {"product_id":"P456","name":"Monitor","price":299.99,"quantity":1,"specs":{"size":"27 inch","resolution":"4K"}}             |          0
     1002 | {"product_id":"P567","name":"USB Cable","price":9.99,"quantity":3}                                                          |          0
     1002 | {"product_id":"P678","name":"Headphones","price":159.99,"quantity":1,"specs":{"type":"Wireless","battery_life":"20 hours"}} |          1
(3 rows)
```

Since the statement doesn’t specify an alias for the offset column, Amazon Redshift defaults to naming it `offset_col`.

## Unnesting an array with an explicit OFFSET column
<a name="r_FROM_clause-unnest-examples-explicit-offset"></a>

The following query also uses the `UNNEST ... WITH OFFSET` parameter to show the products within their order arrays. The difference in this query compared to the query in the previous example is that it explicitly names the offset column with the alias `idx`.

```
SELECT o.order_id, up.product, up.idx
FROM orders o, UNNEST(o.products) WITH OFFSET AS up(product, idx);

 order_id |                                                           product                                                           | idx 
----------+-----------------------------------------------------------------------------------------------------------------------------+-----
     1001 | {"product_id":"P456","name":"Monitor","price":299.99,"quantity":1,"specs":{"size":"27 inch","resolution":"4K"}}             |   0
     1002 | {"product_id":"P567","name":"USB Cable","price":9.99,"quantity":3}                                                          |   0
     1002 | {"product_id":"P678","name":"Headphones","price":159.99,"quantity":1,"specs":{"type":"Wireless","battery_life":"20 hours"}} |   1
(3 rows)
```

# WHERE clause
<a name="r_WHERE_clause"></a>

The WHERE clause contains conditions that either join tables or apply predicates to columns in tables. Tables can be inner-joined by using appropriate syntax in either the WHERE clause or the FROM clause. Outer join criteria must be specified in the FROM clause. 

## Syntax
<a name="r_WHERE_clause-synopsis"></a>

```
[ WHERE condition ]
```

## *condition*
<a name="r_WHERE_clause-synopsis-condition"></a>

Any search condition with a Boolean result, such as a join condition or a predicate on a table column. The following examples are valid join conditions: 

```
sales.listid=listing.listid
sales.listid<>listing.listid
```

The following examples are valid conditions on columns in tables: 

```
catgroup like 'S%'
venueseats between 20000 and 50000
eventname in('Jersey Boys','Spamalot')
year=2008
length(catdesc)>25
date_part(month, caldate)=6
```

Conditions can be simple or complex; for complex conditions, you can use parentheses to isolate logical units. In the following example, the join condition is enclosed by parentheses. 

```
where (category.catid=event.catid) and category.catid in(6,7,8)
```

## Usage notes
<a name="r_WHERE_clause_usage_notes"></a>

You can use aliases in the WHERE clause to reference select list expressions. 

You can't restrict the results of aggregate functions in the WHERE clause; use the HAVING clause for this purpose. 

Columns that are restricted in the WHERE clause must derive from table references in the FROM clause. 

## Example
<a name="r_SELECT_synopsis-example"></a>

The following query uses a combination of different WHERE clause restrictions, including a join condition for the SALES and EVENT tables, a predicate on the EVENTNAME column, and two predicates on the STARTTIME column. 

```
select eventname, starttime, pricepaid/qtysold as costperticket, qtysold
from sales, event
where sales.eventid = event.eventid
and eventname='Hannah Montana'
and date_part(quarter, starttime) in(1,2)
and date_part(year, starttime) = 2008
order by 3 desc, 4, 2, 1 limit 10;

eventname    |      starttime      |   costperticket   | qtysold
----------------+---------------------+-------------------+---------
Hannah Montana | 2008-06-07 14:00:00 |     1706.00000000 |       2
Hannah Montana | 2008-05-01 19:00:00 |     1658.00000000 |       2
Hannah Montana | 2008-06-07 14:00:00 |     1479.00000000 |       1
Hannah Montana | 2008-06-07 14:00:00 |     1479.00000000 |       3
Hannah Montana | 2008-06-07 14:00:00 |     1163.00000000 |       1
Hannah Montana | 2008-06-07 14:00:00 |     1163.00000000 |       2
Hannah Montana | 2008-06-07 14:00:00 |     1163.00000000 |       4
Hannah Montana | 2008-05-01 19:00:00 |      497.00000000 |       1
Hannah Montana | 2008-05-01 19:00:00 |      497.00000000 |       2
Hannah Montana | 2008-05-01 19:00:00 |      497.00000000 |       4
(10 rows)
```

# Oracle-Style outer joins in the WHERE clause
<a name="r_WHERE_oracle_outer"></a>

For Oracle compatibility, Amazon Redshift supports the Oracle outer-join operator (\$1) in WHERE clause join conditions. This operator is intended for use only in defining outer-join conditions; don't try to use it in other contexts. Other uses of this operator are silently ignored in most cases. 

An outer join returns all of the rows that the equivalent inner join would return, plus non-matching rows from one or both tables. In the FROM clause, you can specify left, right, and full outer joins. In the WHERE clause, you can specify left and right outer joins only. 

To outer join tables TABLE1 and TABLE2 and return non-matching rows from TABLE1 (a left outer join), specify `TABLE1 LEFT OUTER JOIN TABLE2` in the FROM clause or apply the (\$1) operator to all joining columns from TABLE2 in the WHERE clause. For all rows in TABLE1 that have no matching rows in TABLE2, the result of the query contains nulls for any select list expressions that contain columns from TABLE2. 

To produce the same behavior for all rows in TABLE2 that have no matching rows in TABLE1, specify `TABLE1 RIGHT OUTER JOIN TABLE2` in the FROM clause or apply the (\$1) operator to all joining columns from TABLE1 in the WHERE clause. 

## Basic syntax
<a name="r_WHERE_oracle_outer-basic-syntax"></a>

```
[ WHERE {
[ table1.column1 = table2.column1(+) ]
[ table1.column1(+) = table2.column1 ]
}
```

The first condition is equivalent to: 

```
from table1 left outer join table2
on table1.column1=table2.column1
```

The second condition is equivalent to: 

```
from table1 right outer join table2
on table1.column1=table2.column1
```

**Note**  
The syntax shown here covers the simple case of an equijoin over one pair of joining columns. However, other types of comparison conditions and multiple pairs of joining columns are also valid. 

For example, the following WHERE clause defines an outer join over two pairs of columns. The (\$1) operator must be attached to the same table in both conditions: 

```
where table1.col1 > table2.col1(+)
and table1.col2 = table2.col2(+)
```

## Usage notes
<a name="r_WHERE_oracle_outer_usage_notes"></a>

Where possible, use the standard FROM clause OUTER JOIN syntax instead of the (\$1) operator in the WHERE clause. Queries that contain the (\$1) operator are subject to the following rules: 
+ You can only use the (\$1) operator in the WHERE clause, and only in reference to columns from tables or views. 
+ You can't apply the (\$1) operator to expressions. However, an expression can contain columns that use the (\$1) operator. For example, the following join condition returns a syntax error: 

  ```
  event.eventid*10(+)=category.catid
  ```

  However, the following join condition is valid: 

  ```
  event.eventid(+)*10=category.catid
  ```
+ You can't use the (\$1) operator in a query block that also contains FROM clause join syntax. 
+ If two tables are joined over multiple join conditions, you must use the (\$1) operator in all or none of these conditions. A join with mixed syntax styles runs as an inner join, without warning. 
+ The (\$1) operator doesn't produce an outer join if you join a table in the outer query with a table that results from an inner query. 
+ To use the (\$1) operator to outer-join a table to itself, you must define table aliases in the FROM clause and reference them in the join condition: 

  ```
  select count(*)
  from event a, event b
  where a.eventid(+)=b.catid;
  
  count
  -------
  8798
  (1 row)
  ```
+ You can't combine a join condition that contains the (\$1) operator with an OR condition or an IN condition. For example: 

  ```
  select count(*) from sales, listing
  where sales.listid(+)=listing.listid or sales.salesid=0;
  ERROR:  Outer join operator (+) not allowed in operand of OR or IN.
  ```
+  In a WHERE clause that outer-joins more than two tables, the (\$1) operator can be applied only once to a given table. In the following example, the SALES table can't be referenced with the (\$1) operator in two successive joins. 

  ```
  select count(*) from sales, listing, event
  where sales.listid(+)=listing.listid and sales.dateid(+)=date.dateid;
  ERROR:  A table may be outer joined to at most one other table.
  ```
+  If the WHERE clause outer-join condition compares a column from TABLE2 with a constant, apply the (\$1) operator to the column. If you don't include the operator, the outer-joined rows from TABLE1, which contain nulls for the restricted column, are eliminated. See the Examples section below. 

## Examples
<a name="r_WHERE_oracle_outer-examples"></a>

The following join query specifies a left outer join of the SALES and LISTING tables over their LISTID columns: 

```
select count(*)
from sales, listing
where sales.listid = listing.listid(+);

count
--------
172456
(1 row)
```

The following equivalent query produces the same result but uses FROM clause join syntax: 

```
select count(*)
from sales left outer join listing on sales.listid = listing.listid;

count
--------
172456
(1 row)
```

The SALES table doesn't contain records for all listings in the LISTING table because not all listings result in sales. The following query outer-joins SALES and LISTING and returns rows from LISTING even when the SALES table reports no sales for a given list ID. The PRICE and COMM columns, derived from the SALES table, contain nulls in the result set for those non-matching rows. 

```
select listing.listid, sum(pricepaid) as price,
sum(commission) as comm
from listing, sales
where sales.listid(+) = listing.listid and listing.listid between 1 and 5
group by 1 order by 1;

listid | price  |  comm
--------+--------+--------
1 | 728.00 | 109.20
2 |        |
3 |        |
4 |  76.00 |  11.40
5 | 525.00 |  78.75
(5 rows)
```

Note that when the WHERE clause join operator is used, the order of the tables in the FROM clause doesn't matter. 

An example of a more complex outer join condition in the WHERE clause is the case where the condition consists of a comparison between two table columns *and* a comparison with a constant: 

```
where category.catid=event.catid(+) and eventid(+)=796;
```

Note that the (\$1) operator is used in two places: first in the equality comparison between the tables and second in the comparison condition for the EVENTID column. The result of this syntax is the preservation of the outer-joined rows when the restriction on EVENTID is evaluated. If you remove the (\$1) operator from the EVENTID restriction, the query treats this restriction as a filter, not as part of the outer-join condition. In turn, the outer-joined rows that contain nulls for EVENTID are eliminated from the result set. 

Here is a complete query that illustrates this behavior: 

```
select catname, catgroup, eventid
from category, event
where category.catid=event.catid(+) and eventid(+)=796;

catname | catgroup | eventid
-----------+----------+---------
Classical | Concerts |
Jazz | Concerts |
MLB | Sports   |
MLS | Sports   |
Musicals | Shows    | 796
NBA | Sports   |
NFL | Sports   |
NHL | Sports   |
Opera | Shows    |
Plays | Shows    |
Pop | Concerts |
(11 rows)
```

The equivalent query using FROM clause syntax is as follows: 

```
select catname, catgroup, eventid
from category left join event
on category.catid=event.catid and eventid=796;
```

If you remove the second (\$1) operator from the WHERE clause version of this query, it returns only 1 row (the row where `eventid=796`). 

```
select catname, catgroup, eventid
from category, event
where category.catid=event.catid(+) and eventid=796;

catname | catgroup | eventid
-----------+----------+---------
Musicals | Shows    | 796
(1 row)
```

# GROUP BY clause
<a name="r_GROUP_BY_clause"></a>

The GROUP BY clause identifies the grouping columns for the query. It is used to group together those rows in a table that have the same values in all the columns listed. The order in which the columns are listed does not matter. The outcome is to combine each set of rows having common values into one group row that represents all rows in the group. Use a GROUP BY to eliminate redundancy in the output and to compute aggregates that apply to the groups. Grouping columns must be declared when the query computes aggregates with standard functions such as SUM, AVG, and COUNT. For more information, see [Aggregate functions](c_Aggregate_Functions.md).

## Syntax
<a name="r_GROUP_BY_clause-syntax"></a>

```
[ GROUP BY  expression [, ...] | ALL | aggregation_extension  ]
```

where *aggregation\$1extension* is one of the following:

```
GROUPING SETS ( () | aggregation_extension [, ...] ) |
ROLLUP ( expr [, ...] ) |
CUBE ( expr [, ...] )
```

## Parameters
<a name="r_GROUP_BY_clause-parameters"></a>

 *expression*  
The list of columns or expressions must match the list of non-aggregate expressions in the select list of the query. For example, consider the following simple query.  

```
select listid, eventid, sum(pricepaid) as revenue,
count(qtysold) as numtix
from sales
group by listid, eventid
order by 3, 4, 2, 1
limit 5;

listid | eventid | revenue | numtix
-------+---------+---------+--------
89397  |      47 |   20.00 |      1
106590 |      76 |   20.00 |      1
124683 |     393 |   20.00 |      1
103037 |     403 |   20.00 |      1
147685 |     429 |   20.00 |      1
(5 rows)
```
In this query, the select list consists of two aggregate expressions. The first uses the SUM function and the second uses the COUNT function. The remaining two columns, LISTID and EVENTID, must be declared as grouping columns.  
Expressions in the GROUP BY clause can also reference the select list by using ordinal numbers. For example, the previous example could be abbreviated as follows.  

```
select listid, eventid, sum(pricepaid) as revenue,
count(qtysold) as numtix
from sales
group by 1,2
order by 3, 4, 2, 1
limit 5;

listid | eventid | revenue | numtix
-------+---------+---------+--------
89397  |      47 |   20.00 |      1
106590 |      76 |   20.00 |      1
124683 |     393 |   20.00 |      1
103037 |     403 |   20.00 |      1
147685 |     429 |   20.00 |      1
(5 rows)
```

ALL  
ALL indicates to group by all columns specified in the SELECT list except those that are aggregated. For example, consider the following query which groups by `col1` and `col2` without having to specify them individually in the GROUP BY clause. The column `col3` is the argument of the `SUM` function and thus not grouped.  

```
SELECT col1, col2 sum(col3) FROM testtable GROUP BY ALL
```
If you EXCLUDE a column in the SELECT list, the GROUP BY ALL clause does not group the results based on that specific column.  

```
SELECT * EXCLUDE col3 FROM testtable GROUP BY ALL
```

 * *aggregation\$1extension* *   
You can use the aggregation extensions GROUPING SETS, ROLLUP, and CUBE to perform the work of multiple GROUP BY operations in a single statement. For more information on aggregation extensions and related functions, see [Aggregation extensions](r_GROUP_BY_aggregation-extensions.md). 

## Examples
<a name="r_GROUP_BY_clause-examples"></a>

The following examples use the SALES table that contains columns: salesid, listid, sellerid, buyerid, eventid, dateid, qtysold, pricepaid, commission, and saletime. For more information about the SALES table, see [Sample database](c_sampledb.md).

The following example query groups by `salesid` and `listid` without having to specify them individually in the GROUP BY clause. The column `qtysold` is the argument of the `SUM` function and thus not grouped.

```
SELECT salesid, listid, sum(qtysold) FROM sales GROUP BY ALL;

salesid | listid  | sum
--------+---------+------
33095   | 36572   | 2	
88268   | 100813  | 4	
110917  | 127048  | 1	
...
```

The following example query excludes several columns in the SELECT list, so GROUP BY ALL only groups salesid and listid.

```
SELECT * EXCLUDE sellerid, buyerid, eventid, dateid, qtysold, pricepaid, commission, saletime 
FROM sales GROUP BY ALL;

salesid | listid 
--------+---------
33095   | 36572   	
88268   | 100813 	
110917  | 127048 	
...
```

# Aggregation extensions
<a name="r_GROUP_BY_aggregation-extensions"></a>

Amazon Redshift supports aggregation extensions to do the work of multiple GROUP BY operations in a single statement.

 The examples for aggregation extensions use the `orders` table, which holds sales data for an electronics company. You can create `orders` with the following.

```
CREATE TABLE ORDERS (
    ID INT,
    PRODUCT CHAR(20),
    CATEGORY CHAR(20),
    PRE_OWNED CHAR(1),
    COST DECIMAL
);

INSERT INTO ORDERS VALUES
    (0, 'laptop',       'computers',    'T', 1000),
    (1, 'smartphone',   'cellphones',   'T', 800),
    (2, 'smartphone',   'cellphones',   'T', 810),
    (3, 'laptop',       'computers',    'F', 1050),
    (4, 'mouse',        'computers',    'F', 50);
```

## *GROUPING SETS*
<a name="r_GROUP_BY_aggregation-extensions-grouping-sets"></a>

 Computes one or more grouping sets in a single statement. A grouping set is the set of a single GROUP BY clause, a set of 0 or more columns by which you can group a query's result set. GROUP BY GROUPING SETS is equivalent to running a UNION ALL query on one result set grouped by different columns. For example, GROUP BY GROUPING SETS((a), (b)) is equivalent to GROUP BY a UNION ALL GROUP BY b. 

 The following example returns the cost of the order table's products grouped according to both the products' categories and the kind of products sold. 

```
SELECT category, product, sum(cost) as total
FROM orders
GROUP BY GROUPING SETS(category, product);

       category       |       product        | total
----------------------+----------------------+-------
 computers            |                      |  2100
 cellphones           |                      |  1610
                      | laptop               |  2050
                      | smartphone           |  1610
                      | mouse                |    50

(5 rows)
```

## *ROLLUP*
<a name="r_GROUP_BY_aggregation-extensions-rollup"></a>

 Assumes a hierarchy where preceding columns are considered the parents of subsequent columns. ROLLUP groups data by the provided columns, returning extra subtotal rows representing the totals throughout all levels of grouping columns, in addition to the grouped rows. For example, you can use GROUP BY ROLLUP((a), (b)) to return a result set grouped first by a, then by b while assuming that b is a subsection of a. ROLLUP also returns a row with the whole result set without grouping columns. 

GROUP BY ROLLUP((a), (b)) is equivalent to GROUP BY GROUPING SETS((a,b), (a), ()). 

The following example returns the cost of the order table's products grouped first by category and then product, with product as a subdivision of category.

```
SELECT category, product, sum(cost) as total
FROM orders
GROUP BY ROLLUP(category, product) ORDER BY 1,2;

       category       |       product        | total
----------------------+----------------------+-------
 cellphones           | smartphone           |  1610
 cellphones           |                      |  1610
 computers            | laptop               |  2050
 computers            | mouse                |    50
 computers            |                      |  2100
                      |                      |  3710
(6 rows)
```

## *CUBE*
<a name="r_GROUP_BY_aggregation-extensions-cube"></a>

 Groups data by the provided columns, returning extra subtotal rows representing the totals throughout all levels of grouping columns, in addition to the grouped rows. CUBE returns the same rows as ROLLUP, while adding additional subtotal rows for every combination of grouping column not covered by ROLLUP. For example, you can use GROUP BY CUBE ((a), (b)) to return a result set grouped first by a, then by b while assuming that b is a subsection of a, then by b alone. CUBE also returns a row with the whole result set without grouping columns.

GROUP BY CUBE((a), (b)) is equivalent to GROUP BY GROUPING SETS((a, b), (a), (b), ()). 

The following example returns the cost of the order table's products grouped first by category and then product, with product as a subdivision of category. Unlike the preceding example for ROLLUP, the statement returns results for every combination of grouping column. 

```
SELECT category, product, sum(cost) as total
FROM orders
GROUP BY CUBE(category, product) ORDER BY 1,2;

       category       |       product        | total
----------------------+----------------------+-------
 cellphones           | smartphone           |  1610
 cellphones           |                      |  1610
 computers            | laptop               |  2050
 computers            | mouse                |    50
 computers            |                      |  2100
                      | laptop               |  2050
                      | mouse                |    50
                      | smartphone           |  1610
                      |                      |  3710
(9 rows)
```

## *GROUPING/GROUPING\$1ID functions*
<a name="r_GROUP_BY_aggregation-extentions-grouping"></a>

 ROLLUP and CUBE add NULL values to the result set to indicate subtotal rows. For example, GROUP BY ROLLUP((a), (b)) returns one or more rows that have a value of NULL in the b grouping column to indicate they are subtotals of fields in the a grouping column. These NULL values serve only to satisfy the format of returning tuples.

 When you run GROUP BY operations with ROLLUP and CUBE on relations that store NULL values themselves, this can produce result sets with rows that appear to have identical grouping columns. Returning to the previous example, if the b grouping column contains a stored NULL value, GROUP BY ROLLUP((a), (b)) returns a row with a value of NULL in the b grouping column that isn't a subtotal. 

 To distinguish between NULL values created by ROLLUP and CUBE, and the NULL values stored in the tables themselves, you can use the GROUPING function, or its alias GROUPING\$1ID. GROUPING takes a single grouping set as its argument, and for each row in the result set returns a 0 or 1 bit value corresponding to the grouping column in that position, and then converts that value into an integer. If the value in that position is a NULL value created by an aggregation extension, GROUPING returns 1. It returns 0 for all other values, including stored NULL values.

 For example, GROUPING(category, product) can return the following values for a given row, depending on the grouping column values for that row. For the purposes of this example, all NULL values in the table are NULL values created by an aggregation extension.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/redshift/latest/dg/r_GROUP_BY_aggregation-extensions.html)

GROUPING functions appear in the SELECT list portion of the query in the following format.

```
SELECT ... [GROUPING( expr )...] ...
  GROUP BY ... {CUBE | ROLLUP| GROUPING SETS} ( expr ) ...
```

The following example is the same as the preceding example for CUBE, but with the addition of GROUPING functions for its grouping sets.

```
SELECT category, product,
       GROUPING(category) as grouping0,
       GROUPING(product) as grouping1,
       GROUPING(category, product) as grouping2,
       sum(cost) as total
FROM orders
GROUP BY CUBE(category, product) ORDER BY 3,1,2;

       category       |       product        | grouping0 | grouping1 | grouping2 | total
----------------------+----------------------+-----------+-----------+-----------+-------
 cellphones           | smartphone           |         0 |         0 |         0 |  1610
 cellphones           |                      |         0 |         1 |         1 |  1610
 computers            | laptop               |         0 |         0 |         0 |  2050
 computers            | mouse                |         0 |         0 |         0 |    50
 computers            |                      |         0 |         1 |         1 |  2100
                      | laptop               |         1 |         0 |         2 |  2050
                      | mouse                |         1 |         0 |         2 |    50
                      | smartphone           |         1 |         0 |         2 |  1610
                      |                      |         1 |         1 |         3 |  3710
(9 rows)
```

## *Partial ROLLUP and CUBE*
<a name="r_GROUP_BY_aggregation-extentions-partial"></a>

 You can run ROLLUP and CUBE operations with only a portion of the subtotals. 

 The syntax for partial ROLLUP and CUBE operations is as follows.

```
GROUP BY expr1, { ROLLUP | CUBE }(expr2, [, ...])
```

Here, the GROUP BY clause only creates subtotal rows at the level of *expr2* and onwards.

The following examples show partial ROLLUP and CUBE operations on the orders table, grouping first by whether a product is pre-owned and then running ROLLUP and CUBE on the category and product columns.

```
SELECT pre_owned, category, product,
       GROUPING(category, product, pre_owned) as group_id,
       sum(cost) as total
FROM orders
GROUP BY pre_owned, ROLLUP(category, product) ORDER BY 4,1,2,3;

 pre_owned |       category       |       product        | group_id | total
-----------+----------------------+----------------------+----------+-------
 F         | computers            | laptop               |        0 |  1050
 F         | computers            | mouse                |        0 |    50
 T         | cellphones           | smartphone           |        0 |  1610
 T         | computers            | laptop               |        0 |  1000
 F         | computers            |                      |        2 |  1100
 T         | cellphones           |                      |        2 |  1610
 T         | computers            |                      |        2 |  1000
 F         |                      |                      |        6 |  1100
 T         |                      |                      |        6 |  2610
(9 rows)

SELECT pre_owned, category, product,
       GROUPING(category, product, pre_owned) as group_id,
       sum(cost) as total
FROM orders
GROUP BY pre_owned, CUBE(category, product) ORDER BY 4,1,2,3;

 pre_owned |       category       |       product        | group_id | total
-----------+----------------------+----------------------+----------+-------
 F         | computers            | laptop               |        0 |  1050
 F         | computers            | mouse                |        0 |    50
 T         | cellphones           | smartphone           |        0 |  1610
 T         | computers            | laptop               |        0 |  1000
 F         | computers            |                      |        2 |  1100
 T         | cellphones           |                      |        2 |  1610
 T         | computers            |                      |        2 |  1000
 F         |                      | laptop               |        4 |  1050
 F         |                      | mouse                |        4 |    50
 T         |                      | laptop               |        4 |  1000
 T         |                      | smartphone           |        4 |  1610
 F         |                      |                      |        6 |  1100
 T         |                      |                      |        6 |  2610
(13 rows)
```

Since the pre-owned column isn't included in the ROLLUP and CUBE operations, there's no grand total row that includes all other rows. 

## *Concatenated grouping*
<a name="r_GROUP_BY_aggregation-extentions-concat"></a>

 You can concatenate multiple GROUPING SETS/ROLLUP/CUBE clauses to calculate different levels of subtotals. Concatenated groupings return the Cartesian product of the provided grouping sets. 

 The syntax for concatenating GROUPING SETS/ROLLUP/CUBE clauses is as follows.

```
GROUP BY {ROLLUP|CUBE|GROUPING SETS}(expr1[, ...]),
         {ROLLUP|CUBE|GROUPING SETS}(expr1[, ...])[, ...]
```

Consider the following example to see how a small concatenated grouping can produce a large final result set.

```
SELECT pre_owned, category, product,
       GROUPING(category, product, pre_owned) as group_id,
       sum(cost) as total
FROM orders
GROUP BY CUBE(category, product), GROUPING SETS(pre_owned, ())
ORDER BY 4,1,2,3;

 pre_owned |       category       |       product        | group_id | total
-----------+----------------------+----------------------+----------+-------
 F         | computers            | laptop               |        0 |  1050
 F         | computers            | mouse                |        0 |    50
 T         | cellphones           | smartphone           |        0 |  1610
 T         | computers            | laptop               |        0 |  1000
           | cellphones           | smartphone           |        1 |  1610
           | computers            | laptop               |        1 |  2050
           | computers            | mouse                |        1 |    50
 F         | computers            |                      |        2 |  1100
 T         | cellphones           |                      |        2 |  1610
 T         | computers            |                      |        2 |  1000
           | cellphones           |                      |        3 |  1610
           | computers            |                      |        3 |  2100
 F         |                      | laptop               |        4 |  1050
 F         |                      | mouse                |        4 |    50
 T         |                      | laptop               |        4 |  1000
 T         |                      | smartphone           |        4 |  1610
           |                      | laptop               |        5 |  2050
           |                      | mouse                |        5 |    50
           |                      | smartphone           |        5 |  1610
 F         |                      |                      |        6 |  1100
 T         |                      |                      |        6 |  2610
           |                      |                      |        7 |  3710
(22 rows)
```

## *Nested grouping*
<a name="r_GROUP_BY_aggregation-extentions-nested"></a>

 You can use GROUPING SETS/ROLLUP/CUBE operations as your GROUPING SETS *expr* to form a nested grouping. The sub grouping inside nested GROUPING SETS is flattened. 

 The syntax for nested grouping is as follows.

```
GROUP BY GROUPING SETS({ROLLUP|CUBE|GROUPING SETS}(expr[, ...])[, ...])
```

Consider the following example.

```
SELECT category, product, pre_owned,
       GROUPING(category, product, pre_owned) as group_id,
       sum(cost) as total
FROM orders
GROUP BY GROUPING SETS(ROLLUP(category), CUBE(product, pre_owned))
ORDER BY 4,1,2,3;

       category       |       product        | pre_owned | group_id | total
----------------------+----------------------+-----------+----------+-------
 cellphones           |                      |           |        3 |  1610
 computers            |                      |           |        3 |  2100
                      | laptop               | F         |        4 |  1050
                      | laptop               | T         |        4 |  1000
                      | mouse                | F         |        4 |    50
                      | smartphone           | T         |        4 |  1610
                      | laptop               |           |        5 |  2050
                      | mouse                |           |        5 |    50
                      | smartphone           |           |        5 |  1610
                      |                      | F         |        6 |  1100
                      |                      | T         |        6 |  2610
                      |                      |           |        7 |  3710
                      |                      |           |        7 |  3710
(13 rows)
```

Note that because both ROLLUP(category) and CUBE(product, pre\$1owned) contain the grouping set (), the row representing the grand total is duplicated.

## *Usage notes*
<a name="r_GROUP_BY_aggregation-extensions-usage-notes"></a>
+ The GROUP BY clause supports up to 64 grouping sets. In the case of ROLLUP and CUBE, or some combination of GROUPING SETS, ROLLUP, and CUBE, this limitation applies to the implied number of grouping sets. For example, GROUP BY CUBE((a), (b)) counts as 4 grouping sets, not 2.
+ You can't use constants as grouping columns when using aggregation extensions.
+ You can't make a grouping set that contains duplicate columns.

# HAVING clause
<a name="r_HAVING_clause"></a>

The HAVING clause applies a condition to the intermediate grouped result set that a query returns.

## Syntax
<a name="r_HAVING_clause-synopsis"></a>

```
[ HAVING condition ]
```

For example, you can restrict the results of a SUM function:

```
having sum(pricepaid) >10000
```

The HAVING condition is applied after all WHERE clause conditions are applied and GROUP BY operations are completed.

The condition itself takes the same form as any WHERE clause condition.

## Usage notes
<a name="r_HAVING_clause_usage_notes"></a>
+ Any column that is referenced in a HAVING clause condition must be either a grouping column or a column that refers to the result of an aggregate function.
+ In a HAVING clause, you can't specify:
  + An ordinal number that refers to a select list item. Only the GROUP BY and ORDER BY clauses accept ordinal numbers.

## Examples
<a name="r_HAVING_clause-examples"></a>

The following query calculates total ticket sales for all events by name, then eliminates events where the total sales were less than \$1800,000. The HAVING condition is applied to the results of the aggregate function in the select list: `sum(pricepaid)`.

```
select eventname, sum(pricepaid)
from sales join event on sales.eventid = event.eventid
group by 1
having sum(pricepaid) > 800000
order by 2 desc, 1;

eventname        |    sum
-----------------+-----------
Mamma Mia!       | 1135454.00
Spring Awakening |  972855.00
The Country Girl |  910563.00
Macbeth          |  862580.00
Jersey Boys      |  811877.00
Legally Blonde   |  804583.00
```

The following query calculates a similar result set. In this case, however, the HAVING condition is applied to an aggregate that isn't specified in the select list: `sum(qtysold)`. Events that did not sell more than 2,000 tickets are eliminated from the final result.

```
select eventname, sum(pricepaid)
from sales join event on sales.eventid = event.eventid
group by 1
having sum(qtysold) >2000
order by 2 desc, 1;

eventname        |    sum
-----------------+-----------
Mamma Mia!       | 1135454.00
Spring Awakening |  972855.00
The Country Girl |  910563.00
Macbeth          |  862580.00
Jersey Boys      |  811877.00
Legally Blonde   |  804583.00
Chicago          |  790993.00
Spamalot         |  714307.00
```

The following query calculates total ticket sales for all events by name, then eliminates events where the total sales were less than \$1800,000. The HAVING condition is applied to the results of the aggregate function in the select list using the alias `pp` for `sum(pricepaid)`.

```
select eventname, sum(pricepaid) as pp
from sales join event on sales.eventid = event.eventid
group by 1
having pp > 800000
order by 2 desc, 1;

eventname        |    pp
-----------------+-----------
Mamma Mia!       | 1135454.00
Spring Awakening |  972855.00
The Country Girl |  910563.00
Macbeth          |  862580.00
Jersey Boys      |  811877.00
Legally Blonde   |  804583.00
```

# QUALIFY clause
<a name="r_QUALIFY_clause"></a>

The QUALIFY clause filters results of a previously computed window function according to user‑specified search conditions. You can use the clause to apply filtering conditions to the result of a window function without using a subquery.

It is similar to the [HAVING clause](https://docs.aws.amazon.com/redshift/latest/dg/r_HAVING_clause.html), which applies a condition to further filter rows from a WHERE clause. The difference between QUALIFY and HAVING is that filtered results from the QUALIFY clause could be based on the result of running window functions on the data. You can use both the QUALIFY and HAVING clauses in one query.

## Syntax
<a name="r_QUALIFY-synopsis"></a>

```
QUALIFY condition
```

**Note**  
If you're using the QUALIFY clause directly after the FROM clause, the FROM relation name must have an alias specified before the QUALIFY clause.

## Examples
<a name="r_QUALIFY-examples"></a>

The examples in this section use the sample data below.

```
create table store_sales (ss_sold_date date, ss_sold_time time, 
               ss_item text, ss_sales_price float);
insert into store_sales values ('2022-01-01', '09:00:00', 'Product 1', 100.0),
                               ('2022-01-01', '11:00:00', 'Product 2', 500.0),
                               ('2022-01-01', '15:00:00', 'Product 3', 20.0),
                               ('2022-01-01', '17:00:00', 'Product 4', 1000.0),
                               ('2022-01-01', '18:00:00', 'Product 5', 30.0),
                               ('2022-01-02', '10:00:00', 'Product 6', 5000.0),
                               ('2022-01-02', '16:00:00', 'Product 7', 5.0);
```

The following example demonstrates how to find the two most expensive items sold after 12:00 each day.

```
SELECT *
FROM store_sales ss
WHERE ss_sold_time > time '12:00:00'
QUALIFY row_number()
OVER (PARTITION BY ss_sold_date ORDER BY ss_sales_price DESC) <= 2
               

 ss_sold_date | ss_sold_time |  ss_item  | ss_sales_price 
--------------+--------------+-----------+----------------
 2022-01-01   | 17:00:00     | Product 4 |           1000
 2022-01-01   | 18:00:00     | Product 5 |             30
 2022-01-02   | 16:00:00     | Product 7 |              5
```

You can then find the last item sold each day.

```
SELECT *
FROM store_sales ss
QUALIFY last_value(ss_item)
OVER (PARTITION BY ss_sold_date ORDER BY ss_sold_time ASC
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) = ss_item;
               
ss_sold_date | ss_sold_time |  ss_item  | ss_sales_price 
--------------+--------------+-----------+----------------
 2022-01-01   | 18:00:00     | Product 5 |             30
 2022-01-02   | 16:00:00     | Product 7 |              5
```

The following example returns the same records as the previous query, the last item sold each day, but it doesn't use the QUALIFY clause.

```
SELECT * FROM (
  SELECT *,
  last_value(ss_item)
  OVER (PARTITION BY ss_sold_date ORDER BY ss_sold_time ASC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) ss_last_item
  FROM store_sales ss
)
WHERE ss_last_item = ss_item;
               
 ss_sold_date | ss_sold_time |  ss_item  | ss_sales_price | ss_last_item 
--------------+--------------+-----------+----------------+--------------
 2022-01-02   | 16:00:00     | Product 7 |              5 | Product 7
 2022-01-01   | 18:00:00     | Product 5 |             30 | Product 5
```

# UNION, INTERSECT, and EXCEPT
<a name="r_UNION"></a>

**Topics**
+ [Syntax](#r_UNION-synopsis)
+ [Parameters](#r_UNION-parameters)
+ [Order of evaluation for set operators](#r_UNION-order-of-evaluation-for-set-operators)
+ [Usage notes](#r_UNION-usage-notes)
+ [Example UNION queries](c_example_union_query.md)
+ [Example UNION ALL query](c_example_unionall_query.md)
+ [Example INTERSECT queries](c_example_intersect_query.md)
+ [Example EXCEPT query](c_Example_MINUS_query.md)

The UNION, INTERSECT, and EXCEPT *set operators* are used to compare and merge the results of two separate query expressions. For example, if you want to know which users of a website are both buyers and sellers but their user names are stored in separate columns or tables, you can find the *intersection* of these two types of users. If you want to know which website users are buyers but not sellers, you can use the EXCEPT operator to find the *difference* between the two lists of users. If you want to build a list of all users, regardless of role, you can use the UNION operator.

## Syntax
<a name="r_UNION-synopsis"></a>

```
query
{ UNION [ ALL ] | INTERSECT | EXCEPT | MINUS }
query
```

## Parameters
<a name="r_UNION-parameters"></a>

 *query*   
A query expression that corresponds, in the form of its select list, to a second query expression that follows the UNION, INTERSECT, or EXCEPT operator. The two expressions must contain the same number of output columns with compatible data types; otherwise, the two result sets can't be compared and merged. Set operations don't allow implicit conversion between different categories of data types; for more information, see [Type compatibility and conversion](c_Supported_data_types.md#r_Type_conversion).  
You can build queries that contain an unlimited number of query expressions and link them with UNION, INTERSECT, and EXCEPT operators in any combination. For example, the following query structure is valid, assuming that the tables T1, T2, and T3 contain compatible sets of columns:   

```
select * from t1
union
select * from t2
except
select * from t3
order by c1;
```

UNION   
Set operation that returns rows from two query expressions, regardless of whether the rows derive from one or both expressions.

INTERSECT   
Set operation that returns rows that derive from two query expressions. Rows that aren't returned by both expressions are discarded.

EXCEPT \$1 MINUS   
Set operation that returns rows that derive from one of two query expressions. To qualify for the result, rows must exist in the first result table but not the second. MINUS and EXCEPT are exact synonyms. 

ALL   
The ALL keyword retains any duplicate rows that are produced by UNION. The default behavior when the ALL keyword isn't used is to discard these duplicates. INTERSECT ALL, EXCEPT ALL, and MINUS ALL aren't supported.

## Order of evaluation for set operators
<a name="r_UNION-order-of-evaluation-for-set-operators"></a>

The UNION and EXCEPT set operators are left-associative. If parentheses aren't specified to influence the order of precedence, a combination of these set operators is evaluated from left to right. For example, in the following query, the UNION of T1 and T2 is evaluated first, then the EXCEPT operation is performed on the UNION result: 

```
select * from t1
union
select * from t2
except
select * from t3
order by c1;
```

The INTERSECT operator takes precedence over the UNION and EXCEPT operators when a combination of operators is used in the same query. For example, the following query evaluates the intersection of T2 and T3, then union the result with T1: 

```
select * from t1
union
select * from t2
intersect
select * from t3
order by c1;
```

By adding parentheses, you can enforce a different order of evaluation. In the following case, the result of the union of T1 and T2 is intersected with T3, and the query is likely to produce a different result. 

```
(select * from t1
union
select * from t2)
intersect
(select * from t3)
order by c1;
```

## Usage notes
<a name="r_UNION-usage-notes"></a>
+ The column names returned in the result of a set operation query are the column names (or aliases) from the tables in the first query expression. Because these column names are potentially misleading, in that the values in the column derive from tables on either side of the set operator, you might want to provide meaningful aliases for the result set.
+ A query expression that precedes a set operator should not contain an ORDER BY clause. An ORDER BY clause produces meaningful sorted results only when it is used at the end of a query that contains set operators. In this case, the ORDER BY clause applies to the final results of all of the set operations. The outermost query can also contain standard LIMIT and OFFSET clauses. 
+ When set operator queries return decimal results, the corresponding result columns are promoted to return the same precision and scale. For example, in the following query, where T1.REVENUE is a DECIMAL(10,2) column and T2.REVENUE is a DECIMAL(8,4) column, the decimal result is promoted to DECIMAL(12,4): 

  ```
  select t1.revenue union select t2.revenue;
  ```

  The scale is `4` because that is the maximum scale of the two columns. The precision is `12` because T1.REVENUE requires 8 digits to the left of the decimal point (12 - 4 = 8). This type promotion ensures that all values from both sides of the UNION fit in the result. For 64-bit values, the maximum result precision is 19 and the maximum result scale is 18. For 128-bit values, the maximum result precision is 38 and the maximum result scale is 37.

  If the resulting data type exceeds Amazon Redshift precision and scale limits, the query returns an error.
+ For set operations, two rows are treated as identical if, for each corresponding pair of columns, the two data values are either *equal* or *both NULL*. For example, if tables T1 and T2 both contain one column and one row, and that row is NULL in both tables, an INTERSECT operation over those tables returns that row.

# Example UNION queries
<a name="c_example_union_query"></a>

In the following UNION query, rows in the SALES table are merged with rows in the LISTING table. Three compatible columns are selected from each table; in this case, the corresponding columns have the same names and data types. 

The final result set is ordered by the first column in the LISTING table and limited to the 5 rows with the highest LISTID value. 

```
select listid, sellerid, eventid from listing
union select listid, sellerid, eventid from sales
order by listid, sellerid, eventid desc limit 5;

listid | sellerid | eventid
--------+----------+---------
1 |    36861 |    7872
2 |    16002 |    4806
3 |    21461 |    4256
4 |     8117 |    4337
5 |     1616 |    8647
(5 rows)
```

The following example shows how you can add a literal value to the output of a UNION query so you can see which query expression produced each row in the result set. The query identifies rows from the first query expression as "B" (for buyers) and rows from the second query expression as "S" (for sellers). 

The query identifies buyers and sellers for ticket transactions that cost \$110,000 or more. The only difference between the two query expressions on either side of the UNION operator is the joining column for the SALES table. 

```
select listid, lastname, firstname, username,
pricepaid as price, 'S' as buyorsell
from sales, users
where sales.sellerid=users.userid
and pricepaid >=10000
union
select listid, lastname, firstname, username, pricepaid,
'B' as buyorsell
from sales, users
where sales.buyerid=users.userid
and pricepaid >=10000
order by 1, 2, 3, 4, 5;

listid | lastname | firstname | username |   price   | buyorsell
--------+----------+-----------+----------+-----------+-----------
209658 | Lamb     | Colette   | VOR15LYI |  10000.00 | B
209658 | West     | Kato      | ELU81XAA |  10000.00 | S
212395 | Greer    | Harlan    | GXO71KOC |  12624.00 | S
212395 | Perry    | Cora      | YWR73YNZ |  12624.00 | B
215156 | Banks    | Patrick   | ZNQ69CLT |  10000.00 | S
215156 | Hayden   | Malachi   | BBG56AKU |  10000.00 | B
(6 rows)
```

The following example uses a UNION ALL operator because duplicate rows, if found, need to be retained in the result. For a specific series of event IDs, the query returns 0 or more rows for each sale associated with each event, and 0 or 1 row for each listing of that event. Event IDs are unique to each row in the LISTING and EVENT tables, but there might be multiple sales for the same combination of event and listing IDs in the SALES table. 

The third column in the result set identifies the source of the row. If it comes from the SALES table, it is marked "Yes" in the SALESROW column. (SALESROW is an alias for SALES.LISTID.) If the row comes from the LISTING table, it is marked "No" in the SALESROW column. 

In this case, the result set consists of three sales rows for listing 500, event 7787. In other words, three different transactions took place for this listing and event combination. The other two listings, 501 and 502, did not produce any sales, so the only row that the query produces for these list IDs comes from the LISTING table (SALESROW = 'No'). 

```
select eventid, listid, 'Yes' as salesrow
from sales
where listid in(500,501,502)
union all
select eventid, listid, 'No'
from listing
where listid in(500,501,502)
order by listid asc;

eventid | listid | salesrow
---------+--------+----------
7787 |    500 | No
7787 |    500 | Yes
7787 |    500 | Yes
7787 |    500 | Yes
6473 |    501 | No
5108 |    502 | No
(6 rows)
```

If you run the same query without the ALL keyword, the result retains only one of the sales transactions. 

```
select eventid, listid, 'Yes' as salesrow
from sales
where listid in(500,501,502)
union
select eventid, listid, 'No'
from listing
where listid in(500,501,502)
order by listid asc;

eventid | listid | salesrow
---------+--------+----------
7787 |    500 | No
7787 |    500 | Yes
6473 |    501 | No
5108 |    502 | No
(4 rows)
```

# Example UNION ALL query
<a name="c_example_unionall_query"></a>

The following example uses a UNION ALL operator because duplicate rows, if found, need to be retained in the result. For a specific series of event IDs, the query returns 0 or more rows for each sale associated with each event, and 0 or 1 row for each listing of that event. Event IDs are unique to each row in the LISTING and EVENT tables, but there might be multiple sales for the same combination of event and listing IDs in the SALES table.

The third column in the result set identifies the source of the row. If it comes from the SALES table, it is marked "Yes" in the SALESROW column. (SALESROW is an alias for SALES.LISTID.) If the row comes from the LISTING table, it is marked "No" in the SALESROW column.

In this case, the result set consists of three sales rows for listing 500, event 7787. In other words, three different transactions took place for this listing and event combination. The other two listings, 501 and 502, did not produce any sales, so the only row that the query produces for these list IDs comes from the LISTING table (SALESROW = 'No').

```
select eventid, listid, 'Yes' as salesrow
from sales
where listid in(500,501,502)
union all
select eventid, listid, 'No'
from listing
where listid in(500,501,502)
order by listid asc;

eventid | listid | salesrow
---------+--------+----------
7787 |    500 | No
7787 |    500 | Yes
7787 |    500 | Yes
7787 |    500 | Yes
6473 |    501 | No
5108 |    502 | No
(6 rows)
```

If you run the same query without the ALL keyword, the result retains only one of the sales transactions. 

```
select eventid, listid, 'Yes' as salesrow
from sales
where listid in(500,501,502)
union
select eventid, listid, 'No'
from listing
where listid in(500,501,502)
order by listid asc;

eventid | listid | salesrow
---------+--------+----------
7787 |    500 | No
7787 |    500 | Yes
6473 |    501 | No
5108 |    502 | No
(4 rows)
```

# Example INTERSECT queries
<a name="c_example_intersect_query"></a>

Compare the following example with the first UNION example. The only difference between the two examples is the set operator that is used, but the results are very different. Only one of the rows is the same: 

```
235494 |    23875 |    8771
```

 This is the only row in the limited result of 5 rows that was found in both tables.

```
select listid, sellerid, eventid from listing
intersect
select listid, sellerid, eventid from sales
order by listid desc, sellerid, eventid
limit 5;

listid | sellerid | eventid
--------+----------+---------
235494 |    23875 |    8771
235482 |     1067 |    2667
235479 |     1589 |    7303
235476 |    15550 |     793
235475 |    22306 |    7848
(5 rows)
```

The following query finds events (for which tickets were sold) that occurred at venues in both New York City and Los Angeles in March. The difference between the two query expressions is the constraint on the VENUECITY column.

```
select distinct eventname from event, sales, venue
where event.eventid=sales.eventid and event.venueid=venue.venueid
and date_part(month,starttime)=3 and venuecity='Los Angeles'
intersect
select distinct eventname from event, sales, venue
where event.eventid=sales.eventid and event.venueid=venue.venueid
and date_part(month,starttime)=3 and venuecity='New York City'
order by eventname asc;

eventname
----------------------------
A Streetcar Named Desire
Dirty Dancing
Electra
Running with Annalise
Hairspray
Mary Poppins
November
Oliver!
Return To Forever
Rhinoceros
South Pacific
The 39 Steps
The Bacchae
The Caucasian Chalk Circle
The Country Girl
Wicked
Woyzeck
(16 rows)
```

# Example EXCEPT query
<a name="c_Example_MINUS_query"></a>

The CATEGORY table in the TICKIT database contains the following 11 rows: 

```
 catid | catgroup |  catname  |                  catdesc
-------+----------+-----------+--------------------------------------------
   1   | Sports   | MLB       | Major League Baseball
   2   | Sports   | NHL       | National Hockey League
   3   | Sports   | NFL       | National Football League
   4   | Sports   | NBA       | National Basketball Association
   5   | Sports   | MLS       | Major League Soccer
   6   | Shows    | Musicals  | Musical theatre
   7   | Shows    | Plays     | All non-musical theatre
   8   | Shows    | Opera     | All opera and light opera
   9   | Concerts | Pop       | All rock and pop music concerts
  10   | Concerts | Jazz      | All jazz singers and bands
  11   | Concerts | Classical | All symphony, concerto, and choir concerts
(11 rows)
```

Assume that a CATEGORY\$1STAGE table (a staging table) contains one additional row: 

```
 catid | catgroup |  catname  |                  catdesc
-------+----------+-----------+--------------------------------------------
1 | Sports   | MLB       | Major League Baseball
2 | Sports   | NHL       | National Hockey League
3 | Sports   | NFL       | National Football League
4 | Sports   | NBA       | National Basketball Association
5 | Sports   | MLS       | Major League Soccer
6 | Shows    | Musicals  | Musical theatre
7 | Shows    | Plays     | All non-musical theatre
8 | Shows    | Opera     | All opera and light opera
9 | Concerts | Pop       | All rock and pop music concerts
10 | Concerts | Jazz      | All jazz singers and bands
11 | Concerts | Classical | All symphony, concerto, and choir concerts
12 | Concerts | Comedy    | All stand up comedy performances
(12 rows)
```

Return the difference between the two tables. In other words, return rows that are in the CATEGORY\$1STAGE table but not in the CATEGORY table: 

```
select * from category_stage
except
select * from category;

catid | catgroup | catname |             catdesc
-------+----------+---------+----------------------------------
12 | Concerts | Comedy  | All stand up comedy performances
(1 row)
```

The following equivalent query uses the synonym MINUS. 

```
select * from category_stage
minus
select * from category;

catid | catgroup | catname |             catdesc
-------+----------+---------+----------------------------------
12 | Concerts | Comedy  | All stand up comedy performances
(1 row)
```

If you reverse the order of the SELECT expressions, the query returns no rows. 

# ORDER BY clause
<a name="r_ORDER_BY_clause"></a>

**Topics**
+ [Syntax](#r_ORDER_BY_clause-synopsis)
+ [Parameters](#r_ORDER_BY_clause-parameters)
+ [Usage notes](#r_ORDER_BY_usage_notes)
+ [Examples with ORDER BY](r_Examples_with_ORDER_BY.md)

The ORDER BY clause sorts the result set of a query.

## Syntax
<a name="r_ORDER_BY_clause-synopsis"></a>

```
[ ORDER BY expression [ ASC | DESC ] ]
[ NULLS FIRST | NULLS LAST ]
[ LIMIT { count | ALL } ]
[ OFFSET start ]
```

## Parameters
<a name="r_ORDER_BY_clause-parameters"></a>

 *expression*   
Expression that defines the sort order of the query result set, typically by specifying one or more columns in the select list. Results are returned based on binary UTF-8 ordering. You can also specify the following:  
+ Columns that aren't in the select list
+ Expressions formed from one or more columns that exist in the tables referenced by the query
+ Ordinal numbers that represent the position of select list entries (or the position of columns in the table if no select list exists)
+ Aliases that define select list entries
When the ORDER BY clause contains multiple expressions, the result set is sorted according to the first expression, then the second expression is applied to rows that have matching values from the first expression, and so on.

ASC \$1 DESC   
Option that defines the sort order for the expression, as follows:   
+ ASC: ascending (for example, low to high for numeric values and 'A' to 'Z' for character strings). If no option is specified, data is sorted in ascending order by default. 
+ DESC: descending (high to low for numeric values; 'Z' to 'A' for strings). 

NULLS FIRST \$1 NULLS LAST  
Option that specifies whether NULL values should be ordered first, before non-null values, or last, after non-null values. By default, NULL values are sorted and ranked last in ASC ordering, and sorted and ranked first in DESC ordering.

LIMIT *number* \$1 ALL   <a name="order-by-clause-limit"></a>
Option that controls the number of sorted rows that the query returns. The LIMIT number must be a positive integer; the maximum value is `2147483647`.   
LIMIT 0 returns no rows. You can use this syntax for testing purposes: to check that a query runs (without displaying any rows) or to return a column list from a table. An ORDER BY clause is redundant if you are using LIMIT 0 to return a column list. The default is LIMIT ALL. 

OFFSET *start*   <a name="order-by-clause-offset"></a>
Option that specifies to skip the number of rows before *start* before beginning to return rows. The OFFSET number must be a positive integer; the maximum value is `2147483647`. When used with the LIMIT option, OFFSET rows are skipped before starting to count the LIMIT rows that are returned. If the LIMIT option isn't used, the number of rows in the result set is reduced by the number of rows that are skipped. The rows skipped by an OFFSET clause still have to be scanned, so it might be inefficient to use a large OFFSET value.

## Usage notes
<a name="r_ORDER_BY_usage_notes"></a>

 Note the following expected behavior with ORDER BY clauses: 
+ NULL values are considered "higher" than all other values. With the default ascending sort order, NULL values sort at the end. To change this behavior, use the NULLS FIRST option.
+ When a query doesn't contain an ORDER BY clause, the system returns result sets with no predictable ordering of the rows. The same query run twice might return the result set in a different order. 
+ The LIMIT and OFFSET options can be used without an ORDER BY clause; however, to return a consistent set of rows, use these options in conjunction with ORDER BY. 
+ In any parallel system like Amazon Redshift, when ORDER BY doesn't produce a unique ordering, the order of the rows is nondeterministic. That is, if the ORDER BY expression produces duplicate values, the return order of those rows might vary from other systems or from one run of Amazon Redshift to the next. 
+ Amazon Redshift doesn't support string literals in ORDER BY clauses.

# Examples with ORDER BY
<a name="r_Examples_with_ORDER_BY"></a>

Return all 11 rows from the CATEGORY table, ordered by the second column, CATGROUP. For results that have the same CATGROUP value, order the CATDESC column values by the length of the character string. Then order by columns CATID and CATNAME. 

```
select * from category order by 2, length(catdesc), 1, 3;

catid | catgroup |  catname  |                  catdesc
------+----------+-----------+----------------------------------------
10    | Concerts | Jazz      | All jazz singers and bands
9     | Concerts | Pop       | All rock and pop music concerts
11    | Concerts | Classical | All symphony, concerto, and choir conce
6     | Shows    | Musicals  | Musical theatre
7     | Shows    | Plays     | All non-musical theatre
8     | Shows    | Opera     | All opera and light opera
5     | Sports   | MLS       | Major League Soccer
1     | Sports   | MLB       | Major League Baseball
2     | Sports   | NHL       | National Hockey League
3     | Sports   | NFL       | National Football League
4     | Sports   | NBA       | National Basketball Association
(11 rows)
```

Return selected columns from the SALES table, ordered by the highest QTYSOLD values. Limit the result to the top 10 rows: 

```
select salesid, qtysold, pricepaid, commission, saletime from sales
order by qtysold, pricepaid, commission, salesid, saletime desc
limit 10;

salesid | qtysold | pricepaid | commission |      saletime
--------+---------+-----------+------------+---------------------
15401   |       8 |    272.00 |      40.80 | 2008-03-18 06:54:56
61683   |       8 |    296.00 |      44.40 | 2008-11-26 04:00:23
90528   |       8 |    328.00 |      49.20 | 2008-06-11 02:38:09
74549   |       8 |    336.00 |      50.40 | 2008-01-19 12:01:21
130232  |       8 |    352.00 |      52.80 | 2008-05-02 05:52:31
55243   |       8 |    384.00 |      57.60 | 2008-07-12 02:19:53
16004   |       8 |    440.00 |      66.00 | 2008-11-04 07:22:31
489     |       8 |    496.00 |      74.40 | 2008-08-03 05:48:55
4197    |       8 |    512.00 |      76.80 | 2008-03-23 11:35:33
16929   |       8 |    568.00 |      85.20 | 2008-12-19 02:59:33
(10 rows)
```

Return a column list and no rows by using LIMIT 0 syntax: 

```
select * from venue limit 0;
venueid | venuename | venuecity | venuestate | venueseats
---------+-----------+-----------+------------+------------
(0 rows)
```

# CONNECT BY clause
<a name="r_CONNECT_BY_clause"></a>

The CONNECT BY clause specifies the relationship between rows in a hierarchy. You can use CONNECT BY to select rows in a hierarchical order by joining the table to itself and processing the hierarchical data. For example, you can use it to recursively loop through an organization chart and list data.

Hierarchical queries process in the following order:

1. If the FROM clause has a join, it is processed first.

1. The CONNECT BY clause is evaluated.

1. The WHERE clause is evaluated.

## Syntax
<a name="r_CONNECT_BY_clause-synopsis"></a>

```
[START WITH start_with_conditions]
CONNECT BY connect_by_conditions
```

**Note**  
While START and CONNECT are not reserved words, use delimited identifiers (double quotation marks) or AS if you're using START and CONNECT as table aliases in your query to avoid failure at runtime.

```
SELECT COUNT(*)
FROM Employee "start"
CONNECT BY PRIOR id = manager_id
START WITH name = 'John'
```

```
SELECT COUNT(*)
FROM Employee AS start
CONNECT BY PRIOR id = manager_id
START WITH name = 'John'
```

## Parameters
<a name="r_CONNECT_BY_parameters"></a>

 *start\$1with\$1conditions*   
Conditions that specify the root row(s) of the hierarchy

 *connect\$1by\$1conditions*   
Conditions that specify the relationship between parent rows and child rows of the hierarchy. At least one condition must be qualified with the ` ` unary operator used to refer to the parent row.  

```
PRIOR column = expression
-- or
expression > PRIOR column
```

## Operators
<a name="r_CONNECT_BY_operators"></a>

You can use the following operators in a CONNECT BY query.

 *LEVEL*   
Pseudocolumn that returns the current row level in the hierarchy. Returns 1 for the root row, 2 for the child of the root row, and so on.

 *PRIOR*   
Unary operator that evaluates the expression for the parent row of the current row in the hierarchy.

## Examples
<a name="r_CONNECT_BY_example"></a>

The following example is a CONNECT BY query that returns the number of employees that report directly or indirectly to John, no deeper than 4 levels. 

```
SELECT id, name, manager_id
FROM employee
WHERE LEVEL < 4
START WITH name = 'John'
CONNECT BY PRIOR id = manager_id;
```

Following is the result of the query.

```
id      name      manager_id
------+----------+--------------
  101     John        100
  102     Jorge       101
  103     Kwaku       101
  110     Liu         101
  201     Sofía       102
  106     Mateo       102
  110     Nikki       103
  104     Paulo       103
  105     Richard     103
  120     Saanvi      104
  200     Shirley     104
  205     Zhang       104
```

 Table definition for this example: 

```
CREATE TABLE employee (
   id INT,
   name VARCHAR(20),
   manager_id INT
   );
```

 Following are the rows inserted into the table. 

```
INSERT INTO employee(id, name, manager_id)  VALUES
(100, 'Carlos', null),
(101, 'John', 100),
(102, 'Jorge', 101),
(103, 'Kwaku', 101),
(110, 'Liu', 101),
(106, 'Mateo', 102),
(110, 'Nikki', 103),
(104, 'Paulo', 103),
(105, 'Richard', 103),
(120, 'Saanvi', 104),
(200, 'Shirley', 104),
(201, 'Sofía', 102),
(205, 'Zhang', 104);
```

Following is an organization chart for John's department.

![\[A diagram of an organization chart for John's department.\]](http://docs.aws.amazon.com/redshift/latest/dg/images/org-chart.png)


# Subquery examples
<a name="r_Subquery_examples"></a>

The following examples show different ways in which subqueries fit into SELECT queries. See [JOIN examples](r_Join_examples.md) for another example of the use of subqueries. 

## SELECT list subquery
<a name="r_Subquery_examples-select-list-subquery"></a>

The following example contains a subquery in the SELECT list. This subquery is *scalar*: it returns only one column and one value, which is repeated in the result for each row that is returned from the outer query. The query compares the Q1SALES value that the subquery computes with sales values for two other quarters (2 and 3) in 2008, as defined by the outer query. 

```
select qtr, sum(pricepaid) as qtrsales,
(select sum(pricepaid)
from sales join date on sales.dateid=date.dateid
where qtr='1' and year=2008) as q1sales
from sales join date on sales.dateid=date.dateid
where qtr in('2','3') and year=2008
group by qtr
order by qtr;

qtr  |  qtrsales   |   q1sales
-------+-------------+-------------
2     | 30560050.00 | 24742065.00
3     | 31170237.00 | 24742065.00
(2 rows)
```

## WHERE clause subquery
<a name="r_Subquery_examples-where-clause-subquery"></a>

The following example contains a table subquery in the WHERE clause. This subquery produces multiple rows. In this case, the rows contain only one column, but table subqueries can contain multiple columns and rows, just like any other table. 

The query finds the top 10 sellers in terms of maximum tickets sold. The top 10 list is restricted by the subquery, which removes users who live in cities where there are ticket venues. This query can be written in different ways; for example, the subquery could be rewritten as a join within the main query. 

```
select firstname, lastname, city, max(qtysold) as maxsold
from users join sales on users.userid=sales.sellerid
where users.city not in(select venuecity from venue)
group by firstname, lastname, city
order by maxsold desc, city desc
limit 10;

firstname | lastname  |      city      | maxsold
-----------+-----------+----------------+---------
Noah       | Guerrero | Worcester      |       8
Isadora    | Moss     | Winooski       |       8
Kieran     | Harrison | Westminster    |       8
Heidi      | Davis    | Warwick        |       8
Sara       | Anthony  | Waco           |       8
Bree       | Buck     | Valdez         |       8
Evangeline | Sampson  | Trenton        |       8
Kendall    | Keith    | Stillwater     |       8
Bertha     | Bishop   | Stevens Point  |       8
Patricia   | Anderson | South Portland |       8
(10 rows)
```

## WITH clause subqueries
<a name="r_Subquery_examples-with-clause-subqueries"></a>

See [WITH clause](r_WITH_clause.md). 

# Correlated subqueries
<a name="r_correlated_subqueries"></a>

The following example contains a *correlated subquery* in the WHERE clause; this kind of subquery contains one or more correlations between its columns and the columns produced by the outer query. In this case, the correlation is `where s.listid=l.listid`. For each row that the outer query produces, the subquery is run to qualify or disqualify the row. 

```
select salesid, listid, sum(pricepaid) from sales s
where qtysold=
(select max(numtickets) from listing l
where s.listid=l.listid)
group by 1,2
order by 1,2
limit 5;

salesid | listid |   sum
--------+--------+----------
 27     |     28 | 111.00
 81     |    103 | 181.00
 142    |    149 | 240.00
 146    |    152 | 231.00
 194    |    210 | 144.00
(5 rows)
```

## Correlated subquery patterns that are not supported
<a name="r_correlated_subqueries-correlated-subquery-patterns-that-are-not-supported"></a>

The query planner uses a query rewrite method called subquery decorrelation to optimize several patterns of correlated subqueries for execution in an MPP environment. A few types of correlated subqueries follow patterns that Amazon Redshift can't decorrelate and doesn't support. Queries that contain the following correlation references return errors: 
+  Correlation references that skip a query block, also known as "skip-level correlation references." For example, in the following query, the block containing the correlation reference and the skipped block are connected by a NOT EXISTS predicate: 

  ```
  select event.eventname from event
  where not exists
  (select * from listing
  where not exists
  (select * from sales where event.eventid=sales.eventid));
  ```

  The skipped block in this case is the subquery against the LISTING table. The correlation reference correlates the EVENT and SALES tables. 
+  Correlation references from a subquery that is part of an ON clause in an outer query: 

  ```
  select * from category
  left join event
  on category.catid=event.catid and eventid =
  (select max(eventid) from sales where sales.eventid=event.eventid);
  ```

  The ON clause contains a correlation reference from SALES in the subquery to EVENT in the outer query. 
+ Null-sensitive correlation references to an Amazon Redshift system table. For example: 

  ```
  select attrelid
  from stv_locks sl, pg_attribute
  where sl.table_id=pg_attribute.attrelid and 1 not in
  (select 1 from pg_opclass where sl.lock_owner = opcowner);
  ```
+ Correlation references from within a subquery that contains a window function. 

  ```
  select listid, qtysold
  from sales s
  where qtysold not in
  (select sum(numtickets) over() from listing l where s.listid=l.listid);
  ```
+ References in a GROUP BY column to the results of a correlated subquery. For example: 

  ```
  select listing.listid,
  (select count (sales.listid) from sales where sales.listid=listing.listid) as list
  from listing
  group by list, listing.listid;
  ```
+ Correlation references from a subquery with an aggregate function and a GROUP BY clause, connected to the outer query by an IN predicate. (This restriction doesn't apply to MIN and MAX aggregate functions.) For example: 

  ```
  select * from listing where listid in
  (select sum(qtysold)
  from sales
  where numtickets>4
  group by salesid);
  ```