CREATE TABLE - Amazon Redshift

CREATE TABLE

Creates a new table in the current database. You define a list of columns, which each hold data of a distinct type. The owner of the table is the issuer of the CREATE TABLE command.

Required privileges

Following are required privileges for CREATE TABLE:

  • Superuser

  • Users with the CREATE TABLE privilege

Syntax

CREATE [ [LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] table_name ( { column_name data_type [column_attributes] [ column_constraints ] | table_constraints | LIKE parent_table [ { INCLUDING | EXCLUDING } DEFAULTS ] } [, ... ] ) [ BACKUP { YES | NO } ] [table_attributes] where column_attributes are: [ DEFAULT default_expr ] [ IDENTITY ( seed, step ) ] [ GENERATED BY DEFAULT AS IDENTITY ( seed, step ) ] [ ENCODE encoding ] [ DISTKEY ] [ SORTKEY ] [ COLLATE CASE_SENSITIVE | COLLATE CASE_INSENSITIVE ] and column_constraints are: [ { NOT NULL | NULL } ] [ { UNIQUE | PRIMARY KEY } ] [ REFERENCES reftable [ ( refcolumn ) ] ] and table_constraints are: [ UNIQUE ( column_name [, ... ] ) ] [ PRIMARY KEY ( column_name [, ... ] ) ] [ FOREIGN KEY (column_name [, ... ] ) REFERENCES reftable [ ( refcolumn ) ] and table_attributes are: [ DISTSTYLE { AUTO | EVEN | KEY | ALL } ] [ DISTKEY ( column_name ) ] [ [COMPOUND | INTERLEAVED ] SORTKEY ( column_name [,...]) | [ SORTKEY AUTO ] ] [ ENCODE AUTO ]

Parameters

LOCAL

Optional. Although this keyword is accepted in the statement, it has no effect in Amazon Redshift.

TEMPORARY | TEMP

Keyword that creates a temporary table that is visible only within the current session. The table is automatically dropped at the end of the session in which it is created. The temporary table can have the same name as a permanent table. The temporary table is created in a separate, session-specific schema. (You can't specify a name for this schema.) This temporary schema becomes the first schema in the search path, so the temporary table takes precedence over the permanent table unless you qualify the table name with the schema name to access the permanent table. For more information about schemas and precedence, see search_path.

Note

By default, database users have permission to create temporary tables by their automatic membership in the PUBLIC group. To deny this privilege to a user, revoke the TEMP privilege from the PUBLIC group, and then explicitly grant the TEMP privilege only to specific users or groups of users.

IF NOT EXISTS

Clause that indicates that if the specified table already exists, the command should make no changes and return a message that the table exists, rather than stopping with an error. Note that the existing table might be nothing like the one that would have been created; only the table name is compared.

This clause is useful when scripting, so the script doesn’t fail if CREATE TABLE tries to create a table that already exists.

table_name

Name of the table to be created.

Important

If you specify a table name that begins with '# ', the table is created as a temporary table. The following is an example:

create table #newtable (id int);

You also reference the table with the '# '. For example:

select * from #newtable;

The maximum length for the table name is 127 bytes; longer names are truncated to 127 bytes. You can use UTF-8 multibyte characters up to a maximum of four bytes. Amazon Redshift enforces a quota of the number of tables per cluster by node type, including user-defined temporary tables and temporary tables created by Amazon Redshift during query processing or system maintenance. Optionally, the table name can be qualified with the database and schema name. In the following example, the database name is tickit, the schema name is public, and the table name is test.

create table tickit.public.test (c1 int);

If the database or schema doesn't exist, the table isn't created, and the statement returns an error. You can't create tables or views in the system databases template0, template1, padb_harvest , or sys:internal.

If a schema name is given, the new table is created in that schema (assuming the creator has access to the schema). The table name must be a unique name for that schema. If no schema is specified, the table is created by using the current database schema. If you are creating a temporary table, you can't specify a schema name, because temporary tables exist in a special schema.

Multiple temporary tables with the same name can exist at the same time in the same database if they are created in separate sessions because the tables are assigned to different schemas. For more information about valid names, see Names and identifiers.

column_name

Name of a column to be created in the new table. The maximum length for the column name is 127 bytes; longer names are truncated to 127 bytes. You can use UTF-8 multibyte characters up to a maximum of four bytes. The maximum number of columns you can define in a single table is 1,600. For more information about valid names, see Names and identifiers.

Note

If you are creating a "wide table," take care that your list of columns doesn't exceed row-width boundaries for intermediate results during loads and query processing. For more information, see Usage notes.

data_type

Data type of the column being created. For CHAR and VARCHAR columns, you can use the MAX keyword instead of declaring a maximum length. MAX sets the maximum length to 4,096 bytes for CHAR or 65535 bytes for VARCHAR. The maximum size of a GEOMETRY object is 1,048,447 bytes.

For information about the data types that Amazon Redshift supports, see Data types.

DEFAULT default_expr

Clause that assigns a default data value for the column. The data type of default_expr must match the data type of the column. The DEFAULT value must be a variable-free expression. Subqueries, cross-references to other columns in the current table, and user-defined functions aren't allowed.

The default_expr expression is used in any INSERT operation that doesn't specify a value for the column. If no default value is specified, the default value for the column is null.

If a COPY operation with a defined column list omits a column that has a DEFAULT value, the COPY command inserts the value of default_expr.

IDENTITY(seed, step)

Clause that specifies that the column is an IDENTITY column. An IDENTITY column contains unique autogenerated values. The data type for an IDENTITY column must be either INT or BIGINT.

When you add rows using an INSERT or INSERT INTO [tablename] VALUES() statement, these values start with the value specified as seed and increment by the number specified as step.

When you load the table using an INSERT INTO [tablename] SELECT * FROM or COPY statement, the data is loaded in parallel and distributed to the node slices. To be sure that the identity values are unique, Amazon Redshift skips a number of values when creating the identity values. Identity values are unique, but the order might not match the order in the source files.

GENERATED BY DEFAULT AS IDENTITY(seed, step)

Clause that specifies that the column is a default IDENTITY column and enables you to automatically assign a unique value to the column. The data type for an IDENTITY column must be either INT or BIGINT. When you add rows without values, these values start with the value specified as seed and increment by the number specified as step. For information about how values are generated, see IDENTITY .

Also, during INSERT, UPDATE, or COPY you can provide a value without EXPLICIT_IDS. Amazon Redshift uses that value to insert into the identity column instead of using the system-generated value. The value can be a duplicate, a value less than the seed, or a value between step values. Amazon Redshift doesn't check the uniqueness of values in the column. Providing a value doesn't affect the next system-generated value.

Note

If you require uniqueness in the column, don't add a duplicate value. Instead, add a unique value that is less than the seed or between step values.

Keep in mind the following about default identity columns:

  • Default identity columns are NOT NULL. NULL can't be inserted.

  • To insert a generated value into a default identity column, use the keyword DEFAULT.

    INSERT INTO tablename (identity-column-name) VALUES (DEFAULT);
  • Overriding values of a default identity column doesn't affect the next generated value.

  • You can't add a default identity column with the ALTER TABLE ADD COLUMN statement.

  • You can append a default identity column with the ALTER TABLE APPEND statement.

ENCODE encoding

The compression encoding for a column. ENCODE AUTO is the default for tables. Amazon Redshift automatically manages compression encoding for all columns in the table. If you specify compression encoding for any column in the table, the table is no longer set to ENCODE AUTO. Amazon Redshift no longer automatically manages compression encoding for all columns in the table. You can specify the ENCODE AUTO option for the table to enable Amazon Redshift to automatically manage compression encoding for all columns in the table.

Amazon Redshift automatically assigns an initial compression encoding to columns for which you don't specify compression encoding as follows:

  • All columns in temporary tables are assigned RAW compression by default.

  • Columns that are defined as sort keys are assigned RAW compression.

  • Columns that are defined as BOOLEAN, REAL, DOUBLE PRECISION, GEOMETRY, or GEOGRAPHY data type are assigned RAW compression.

  • Columns that are defined as SMALLINT, INTEGER, BIGINT, DECIMAL, DATE, TIME, TIMETZ, TIMESTAMP, or TIMESTAMPTZ are assigned AZ64 compression.

  • Columns that are defined as CHAR, VARCHAR, or VARBYTE are assigned LZO compression.

Note

If you don't want a column to be compressed, explicitly specify RAW encoding.

The following compression encodings are supported:

  • AZ64

  • BYTEDICT

  • DELTA

  • DELTA32K

  • LZO

  • MOSTLY8

  • MOSTLY16

  • MOSTLY32

  • RAW (no compression)

  • RUNLENGTH

  • TEXT255

  • TEXT32K

  • ZSTD

DISTKEY

Keyword that specifies that the column is the distribution key for the table. Only one column in a table can be the distribution key. You can use the DISTKEY keyword after a column name or as part of the table definition by using the DISTKEY (column_name) syntax. Either method has the same effect. For more information, see the DISTSTYLE parameter later in this topic.

The data type of a distribution key column can be: BOOLEAN, REAL, DOUBLE PRECISION, SMALLINT, INTEGER, BIGINT, DECIMAL, DATE, TIME, TIMETZ, TIMESTAMP, or TIMESTAMPTZ, CHAR, or VARCHAR.

SORTKEY

Keyword that specifies that the column is the sort key for the table. When data is loaded into the table, the data is sorted by one or more columns that are designated as sort keys. You can use the SORTKEY keyword after a column name to specify a single-column sort key, or you can specify one or more columns as sort key columns for the table by using the SORTKEY (column_name [, ...]) syntax. Only compound sort keys are created with this syntax.

You can define a maximum of 400 SORTKEY columns per table.

The data type of a sort key column can be: BOOLEAN, REAL, DOUBLE PRECISION, SMALLINT, INTEGER, BIGINT, DECIMAL, DATE, TIME, TIMETZ, TIMESTAMP, or TIMESTAMPTZ, CHAR, or VARCHAR.

COLLATE CASE_SENSITIVE | COLLATE CASE_INSENSITIVE

A clause that specifies whether string search or comparison on the column is CASE_SENSITIVE or CASE_INSENSITIVE. The default value is the same as the current case sensitivity configuration of the database.

To find the database collation information, use the following command:

SELECT db_collation(); db_collation ---------------- case_sensitive (1 row)
NOT NULL | NULL

NOT NULL specifies that the column isn't allowed to contain null values. NULL, the default, specifies that the column accepts null values. IDENTITY columns are declared NOT NULL by default.

UNIQUE

Keyword that specifies that the column can contain only unique values. The behavior of the unique table constraint is the same as that for column constraints, with the additional capability to span multiple columns. To define a unique table constraint, use the UNIQUE ( column_name [, ... ] ) syntax.

Important

Unique constraints are informational and aren't enforced by the system.

PRIMARY KEY

Keyword that specifies that the column is the primary key for the table. Only one column can be defined as the primary key by using a column definition. To define a table constraint with a multiple-column primary key, use the PRIMARY KEY ( column_name [, ... ] ) syntax.

Identifying a column as the primary key provides metadata about the design of the schema. A primary key implies that other tables can rely on this set of columns as a unique identifier for rows. One primary key can be specified for a table, whether as a column constraint or a table constraint. The primary key constraint should name a set of columns that is different from other sets of columns named by any unique constraint defined for the same table.

PRIMARY KEY columns are also defined as NOT NULL.

Important

Primary key constraints are informational only. They aren't enforced by the system, but they are used by the planner.

References reftable [ ( refcolumn ) ]

Clause that specifies a foreign key constraint, which implies that the column must contain only values that match values in the referenced column of some row of the referenced table. The referenced columns should be the columns of a unique or primary key constraint in the referenced table.

Important

Foreign key constraints are informational only. They aren't enforced by the system, but they are used by the planner.

LIKE parent_table [ { INCLUDING | EXCLUDING } DEFAULTS ]

A clause that specifies an existing table from which the new table automatically copies column names, data types, and NOT NULL constraints. The new table and the parent table are decoupled, and any changes made to the parent table aren't applied to the new table. Default expressions for the copied column definitions are copied only if INCLUDING DEFAULTS is specified. The default behavior is to exclude default expressions, so that all columns of the new table have null defaults.

Tables created with the LIKE option don't inherit primary and foreign key constraints. Distribution style, sort keys, BACKUP, and NULL properties are inherited by LIKE tables, but you can't explicitly set them in the CREATE TABLE ... LIKE statement.

BACKUP { YES | NO }

A clause that specifies whether the table should be included in automated and manual cluster snapshots. For tables, such as staging tables, that don't contain critical data, specify BACKUP NO to save processing time when creating snapshots and restoring from snapshots and to reduce storage space on Amazon Simple Storage Service. The BACKUP NO setting has no affect on automatic replication of data to other nodes within the cluster, so tables with BACKUP NO specified are restored in a node failure. The default is BACKUP YES.

DISTSTYLE { AUTO | EVEN | KEY | ALL }

Keyword that defines the data distribution style for the whole table. Amazon Redshift distributes the rows of a table to the compute nodes according to the distribution style specified for the table. The default is AUTO.

The distribution style that you select for tables affects the overall performance of your database. For more information, see Working with data distribution styles. Possible distribution styles are as follows:

  • AUTO: Amazon Redshift assigns an optimal distribution style based on the table data. For example, if AUTO distribution style is specified, Amazon Redshift initially assigns the ALL distribution style to a small table. When the table grows larger, Amazon Redshift might change the distribution style to KEY, choosing the primary key (or a column of the composite primary key) as the DISTKEY. If the table grows larger and none of the columns are suitable to be the DISTKEY, Amazon Redshift changes the distribution style to EVEN. The change in distribution style occurs in the background with minimal impact to user queries.

    To view the distribution style applied to a table, query the PG_CLASS system catalog table. For more information, see Viewing distribution styles.

  • EVEN: The data in the table is spread evenly across the nodes in a cluster in a round-robin distribution. Row IDs are used to determine the distribution, and roughly the same number of rows are distributed to each node.

  • KEY: The data is distributed by the values in the DISTKEY column. When you set the joining columns of joining tables as distribution keys, the joining rows from both tables are collocated on the compute nodes. When data is collocated, the optimizer can perform joins more efficiently. If you specify DISTSTYLE KEY, you must name a DISTKEY column, either for the table or as part of the column definition. For more information, see the DISTKEY parameter earlier in this topic.

  • ALL: A copy of the entire table is distributed to every node. This distribution style ensures that all the rows required for any join are available on every node, but it multiplies storage requirements and increases the load and maintenance times for the table. ALL distribution can improve execution time when used with certain dimension tables where KEY distribution isn't appropriate, but performance improvements must be weighed against maintenance costs.

DISTKEY ( column_name )

Constraint that specifies the column to be used as the distribution key for the table. You can use the DISTKEY keyword after a column name or as part of the table definition, by using the DISTKEY (column_name) syntax. Either method has the same effect. For more information, see the DISTSTYLE parameter earlier in this topic.

[COMPOUND | INTERLEAVED ] SORTKEY ( column_name [,...]) | [ SORTKEY AUTO ]

Specifies one or more sort keys for the table. When data is loaded into the table, the data is sorted by the columns that are designated as sort keys. You can use the SORTKEY keyword after a column name to specify a single-column sort key, or you can specify one or more columns as sort key columns for the table by using the SORTKEY (column_name [ , ... ] ) syntax.

You can optionally specify COMPOUND or INTERLEAVED sort style. If you specify SORTKEY with columns the default is COMPOUND. For more information, see Working with sort keys.

If you don't specify any sort keys options, the default is AUTO.

You can define a maximum of 400 COMPOUND SORTKEY columns or 8 INTERLEAVED SORTKEY columns per table.

AUTO

Specifies that Amazon Redshift assigns an optimal sort key based on the table data. For example, if AUTO sort key is specified, Amazon Redshift initially assigns no sort key to a table. If Amazon Redshift determines that a sort key will improve the performance of queries, then Amazon Redshift might change the sort key of your table. The actual sorting of the table is done by automatic table sort. For more information, see Automatic table sort.

Amazon Redshift doesn't modify tables that have existing sort or distribution keys. With one exception, if a table has a distribution key that has never been used in a JOIN, then the key might be changed if Amazon Redshift determines there is a better key.

To view the sort key of a table, query the SVV_TABLE_INFO system catalog view. For more information, see SVV_TABLE_INFO. To view the Amazon Redshift Advisor recommendations for tables, query the SVV_ALTER_TABLE_RECOMMENDATIONS system catalog view. For more information, see SVV_ALTER_TABLE_RECOMMENDATIONS. To view the actions taken by Amazon Redshift, query the SVL_AUTO_WORKER_ACTION system catalog view. For more information, see SVL_AUTO_WORKER_ACTION.

COMPOUND

Specifies that the data is sorted using a compound key made up of all of the listed columns, in the order they are listed. A compound sort key is most useful when a query scans rows according to the order of the sort columns. The performance benefits of sorting with a compound key decrease when queries rely on secondary sort columns. You can define a maximum of 400 COMPOUND SORTKEY columns per table.

INTERLEAVED

Specifies that the data is sorted using an interleaved sort key. A maximum of eight columns can be specified for an interleaved sort key.

An interleaved sort gives equal weight to each column, or subset of columns, in the sort key, so queries don't depend on the order of the columns in the sort key. When a query uses one or more secondary sort columns, interleaved sorting significantly improves query performance. Interleaved sorting carries a small overhead cost for data loading and vacuuming operations.

Important

Don’t use an interleaved sort key on columns with monotonically increasing attributes, such as identity columns, dates, or timestamps.

ENCODE AUTO

Enables Amazon Redshift to automatically adjust the encoding type for all columns in the table to optimize query performance. ENCODE AUTO preserves the initial encode types that you specify in creating the table. Then, if Amazon Redshift determines that a new encoding type can improve query performance, Amazon Redshift can change the encoding type of the table columns. ENCODE AUTO is the default if you don't specify an encoding type on any column in the table.

UNIQUE ( column_name [,...] )

Constraint that specifies that a group of one or more columns of a table can contain only unique values. The behavior of the unique table constraint is the same as that for column constraints, with the additional capability to span multiple columns. In the context of unique constraints, null values aren't considered equal. Each unique table constraint must name a set of columns that is different from the set of columns named by any other unique or primary key constraint defined for the table.

Important

Unique constraints are informational and aren't enforced by the system.

PRIMARY KEY ( column_name [,...] )

Constraint that specifies that a column or a number of columns of a table can contain only unique (nonduplicate) non-null values. Identifying a set of columns as the primary key also provides metadata about the design of the schema. A primary key implies that other tables can rely on this set of columns as a unique identifier for rows. One primary key can be specified for a table, whether as a single column constraint or a table constraint. The primary key constraint should name a set of columns that is different from other sets of columns named by any unique constraint defined for the same table.

Important

Primary key constraints are informational only. They aren't enforced by the system, but they are used by the planner.

FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ ( refcolumn ) ]

Constraint that specifies a foreign key constraint, which requires that a group of one or more columns of the new table must only contain values that match values in the referenced column or columns of some row of the referenced table. If refcolumn is omitted, the primary key of reftable is used. The referenced columns must be the columns of a unique or primary key constraint in the referenced table.

Important

Foreign key constraints are informational only. They aren't enforced by the system, but they are used by the planner.

Usage notes

Uniqueness, primary key, and foreign key constraints are informational only; they are not enforced by Amazon Redshift when you populate a table. For example, if you insert data into a table with dependencies, the insert can succeed even if it violates the constraint. Nonetheless, primary keys and foreign keys are used as planning hints and they should be declared if your ETL process or some other process in your application enforces their integrity. For information about how to drop a table with dependencies, see DROP TABLE.

Limits and quotas

Consider the following limits when you create a table.

  • There is a limit for the maximum number of tables in a cluster by node type. For more information, see Limits in the Amazon Redshift Management Guide.

  • The maximum number of characters for a table name is 127.

  • The maximum number of columns you can define in a single table is 1,600.

  • The maximum number of SORTKEY columns you can define in a single table is 400.

Summary of column-level settings and table-level settings

Several attributes and settings can be set at the column level or at the table level. In some cases, setting an attribute or constraint at the column level or at the table level has the same effect. In other cases, they produce different results.

The following list summarizes column-level and table-level settings:

DISTKEY

There is no difference in effect whether set at the column level or at the table level.

If DISTKEY is set, either at the column level or at the table level, DISTSTYLE must be set to KEY or not set at all. DISTSTYLE can be set only at the table level.

SORTKEY

If set at the column level, SORTKEY must be a single column. If SORTKEY is set at the table level, one or more columns can make up a compound or interleaved composite sort key.

COLLATE CASE_SENSITIVE | COLLATE CASE_INSENSITIVE

Amazon Redshift doesn't support altering case-sensitivity configuration for a column. When you append a new column to the table, Amazon Redshift uses the default value for case-sensitivity. Amazon Redshift doesn't support the COLLATE key word when appending a new column.

For information on how to create databases using database collation, see CREATE DATABASE.

For information on the COLLATE function, see COLLATE function.

UNIQUE

At the column level, one or more keys can be set to UNIQUE; the UNIQUE constraint applies to each column individually. If UNIQUE is set at the table level, one or more columns can make up a composite UNIQUE constraint.

PRIMARY KEY

If set at the column level, PRIMARY KEY must be a single column. If PRIMARY KEY is set at the table level, one or more columns can make up a composite primary key .

FOREIGN KEY

There is no difference in effect whether FOREIGN KEY is set at the column level or at the table level. At the column level, the syntax is simply REFERENCES reftable [ ( refcolumn )].

Distribution of incoming data

When the hash distribution scheme of the incoming data matches that of the target table, no physical distribution of the data is actually necessary when the data is loaded. For example, if a distribution key is set for the new table and the data is being inserted from another table that is distributed on the same key column, the data is loaded in place, using the same nodes and slices. However, if the source and target tables are both set to EVEN distribution, data is redistributed into the target table.

Wide tables

You might be able to create a very wide table but be unable to perform query processing, such as INSERT or SELECT statements, on the table. The maximum width of a table with fixed width columns, such as CHAR, is 64KB - 1 (or 65535 bytes). If a table includes VARCHAR columns, the table can have a larger declared width without returning an error because VARCHARS columns don't contribute their full declared width to the calculated query-processing limit. The effective query-processing limit with VARCHAR columns will vary based on a number of factors.

If a table is too wide for inserting or selecting, you receive the following error.

ERROR: 8001 DETAIL: The combined length of columns processed in the SQL statement exceeded the query-processing limit of 65535 characters (pid:7627)

Examples

For examples that show how to use the CREATE TABLE command, see the Examples topic.