Amazon Redshift
Database Developer Guide (API Version 2012-12-01)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Did this page help you?  Yes | No |  Tell us about it...

CREATE TABLE

Creates a new table in the current database. The owner of this table is the issuer of the CREATE TABLE command.

Synopsis

CREATE [ [LOCAL ] { TEMPORARY | TEMP } ] TABLE table_name
( { column_name data_type [column_attributes] [ column_constraints ] 
  | table_constraints
  | LIKE parent_table [ { INCLUDING | EXCLUDING } DEFAULTS ] } 
  [, ... ]  )
[table_attribute]

where column_attributes are:
  [ DEFAULT default_expr ]
  [ IDENTITY ( seed, step ) ] 
  [ ENCODE encoding ] 
  [ DISTKEY ]
  [ SORTKEY ]

and column_constraints are:
  [ { NOT NULL | NULL } ]
  [ { UNIQUE  |  PRIMARY KEY } ]
  [ REFERENCES reftable [ ( refcolumn ) ] ] 

and table_constraints  are:
  [ UNIQUE ( column_name [, ... ] ) ]
  [ PRIMARY KEY ( column_name [, ... ] )  ]
  [ FOREIGN KEY (column_name [, ... ] ) REFERENCES reftable [ ( refcolumn ) ] 

and table_attributes are:
  [ DISTSTYLE { EVEN | KEY | ALL } ] 
  [ DISTKEY ( column_name ) ]
  [ SORTKEY ( column_name [, ...] ) ]
         

Parameters

LOCAL

Although this optional keyword is accepted in the statement, it has no effect in Amazon Redshift.

TEMPORARY | TEMP

Creates a temporary table that is visible only within the current session. The table is automatically dropped at the end of the session in which it is created. The temporary table can have the same name as a permanent table. The temporary table is created in a separate, session-specific schema. (You cannot specify a schema name.) This temporary schema becomes the first schema in the search path, so the temporary table will take precedence over the permanent table unless you qualify the table name with the schema name to access the permanent table. For more information about schemas and precedence, see See search_path.

Note

By default, users have permission to create temporary tables by their automatic membership in the PUBLIC group. To deny this privilege to a user, revoke the TEMP privilege from the PUBLIC group, and then explicitly grant the TEMP privilege only to specific users or groups of users.

table_name

The name of the table to be created.

Important

If you specify a table name that begins with '# ', the table will be created as a temporary table. For example:

create table #newtable (id int);

The maximum length for the table name is 127 characters; longer names are truncated to 127 characters. Amazon Redshift enforces a maximum limit of 9,900 permanent tables per cluster. The table name may be qualified with the database and or schema name. For example:

create table tickit.public.test (c1 int);

In this example, tickit is the database name, public is the schema name, and test is the table name. If the database or schema does not exist, table is not created, and the statement returns an error. You cannot create tables or views in the system databases template0, template1, and padb_harvest.

If a schema name is given, the new table is created in that schema (assuming the creator has access to the schema). The table name must be a unique name for that schema. If no schema is specified, the table is created by using the current database schema. If you are creating a temporary table, you cannot specify a schema name, because temporary tables exist in a special schema.

Multiple temporary tables with the same name can exist at the same time in the same database if they are created in separate sessions because the tables are assigned to different schemas. For more information about valid names, see Names and identifiers.

column_name

The name of a column to be created in the new table. The maximum length for the column name is 127 characters; longer names are truncated to 127 characters. The maximum number of columns you can define in a single table is 1,600. For more information about valid names, see Names and identifiers.

Note

If you are creating a "wide table," take care that your list of columns does not exceed row-width boundaries for intermediate results during loads and query processing. For more information, see CREATE TABLE usage notes.

data_type

The data type of the column being created. For CHAR and VARCHAR columns, you can use the MAX keyword instead of declaring a maximum length. MAX sets the maximum length to 4096 bytes for CHAR or 65535 bytes for VARCHAR. The following Data types are supported:

  • SMALLINT (INT2)

  • INTEGER (INT, INT4)

  • BIGINT (INT8)

  • DECIMAL (NUMERIC)

  • REAL (FLOAT4)

  • DOUBLE PRECISION (FLOAT8)

  • BOOLEAN (BOOL)

  • CHAR (CHARACTER)

  • VARCHAR (CHARACTER VARYING)

  • DATE

  • TIMESTAMP

DEFAULT default_expr

Assigns a default data value for the column. The data type of default_expr must match the data type of the column. The DEFAULT value must be a variable-free expression. Subqueries, cross-references to other columns in the current table, and non-system defined functions are not allowed.

The default_expr is used in any INSERT operation that does not specify a value for the column. If no default value is specified, the default value for the column is null.

If a COPY operation with a defined column list omits a column that has a DEFAULT value and a NOT NULL constraint, the COPY command inserts the value of the default_expr.

If a COPY operation with a defined column list omits a column that has a DEFAULT value and is nullable, the COPY command inserts the value of the default_expr, not the NULL value.

IDENTITY(seed, step)

Specifies that the column is an IDENTITY column. An IDENTITY column contains unique auto-generated values. These values start with the value specified as the seed and increment by the number specified as the step. The data type for an IDENTITY column must be either INT or BIGINT. IDENTITY columns are declared NOT NULL by default and do not accept NULLs.

ENCODE encoding

Compression encoding for a column. RAW is the default, if no compression is selected. The following Compression encodings are supported:

  • BYTEDICT

  • DELTA

  • DELTA32K

  • LZO

  • MOSTLY8

  • MOSTLY16

  • MOSTLY32

  • RAW (no compression, the default setting)

  • RUNLENGTH

  • TEXT255

  • TEXT32K

DISTKEY

Specifies that the column is the distribution key for the table. Only one column in a table can be the distribution key. You can use the DISTKEY keyword after a column name or as part of the table definition by using the DISTKEY (column_name) syntax. Either method has the same effect. For more information, see the DISTSTYLE keyword definition.

SORTKEY

Specifies that the column is the sort key for the table. When data is loaded into the table, the data is sorted by one or more columns that are designated as sort keys. You can use the SORTKEY keyword after a column name to specify a single-column sort key, or you can specify one or more columns as sort key columns for the table by using the SORTKEY (column_name [, ...]) syntax.

If you do not specify any sort keys, the table is not sorted . You can define a maximum of 400 SORTKEY columns per table.

NOT NULL | NULL

NOT NULL specifies that the column is not allowed to contain null values. NULL, the default, specifies that the column accepts null values. IDENTITY columns are declared NOT NULL by default.

UNIQUE

Specifies that the column may contain only unique values. The behavior of the unique table constraint is the same as that for column constraints, with the additional capability to span multiple columns . To define a unique table constraint, use the UNIQUE ( column_name [, ... ] ) syntax.

Important

Unique constraints are informational and are not enforced by the system.

PRIMARY KEY

Specifies that the column is the primary key for the table. Only one column can be defined as the primary key by using a column definition. To define a table constraint with a multiple-column primary key, use the PRIMARY KEY ( column_name [, ... ] ) syntax.

Identifying a column as the primary key provides metadata about the design of the schema. A primary key implies that other tables can rely on this set of columns as a unique identifier for rows. One primary key can be specified for a table, whether as a column constraint or a table constraint. The primary key constraint should name a set of columns that is different from other sets of columns named by any unique constraint defined for the same table.

Important

Primary key constraints are informational only. They are not enforced by the system, but they are used by the planner.

References reftable [ ( refcolumn ) ]

Specifies a foreign key constraint, which implies that the column must contain only values that match values in the referenced column of some row of the referenced table. The referenced columns should be the columns of a unique or primary key constraint in the referenced table.

Important

Foreign key constraints are informational only. They are not enforced by the system, but they are used by the planner.

LIKE parent_table [ { INCLUDING | EXCLUDING } DEFAULTS ]

Specifies an existing table from which the new table automatically copies column names, data types, and NOT NULL constraints. The new table and the parent table are decoupled, and any changes made to the parent table are not applied to the new table. Default expressions for the copied column definitions are copied only if INCLUDING DEFAULTS is specified. The default behavior is to exclude default expressions, so that all columns of the new table have null defaults.

Tables created with the LIKE option do not inherit primary and foreign key constraints. Distribution style, sort keys, and NULL properties are inherited by LIKE tables but they cannot be explicitly set in the CREATE TABLE statement.

DISTSTYLE { EVEN | KEY | ALL }

Defines the data distribution style for the whole table. Amazon Redshift distributes the rows of a table to the compute nodes according the distribution style specified for the table.

The distribution style that you select for tables affects the overall performance of your database. For more information, see Choosing a data distribution style.

  • EVEN: The data in the table is spread evenly across the nodes in a cluster in a round-robin distribution. Row IDs are used to determine the distribution, and roughly the same number of rows are distributed to each node. This is the default distribution method.

  • KEY: The data is distributed by the values in the DISTKEY column. When you set the joining columns of joining tables as distribution keys, the joining rows from both tables are collocated on the compute nodes. When data is collocated, the optimizer can perform joins more efficiently. If you specify DISTSTYLE KEY, you must name a DISTKEY column, either for the table or as part of the column definition. For more information, see the DISTKEY keyword definition.

  • ALL: A copy of the entire table is distributed to every node. This distribution style ensures that all the rows required for any join are available on every node, but it multiplies storage requirements and increases the load and maintenance times for the table. ALL distribution can improve execution time when used with certain dimension tables where KEY distribution is not appropriate, but performance improvements must be weighed against maintenance costs.

DISTKEY ( column_name )

Specifies the column to be used as the distribution key for the table. You can use the DISTKEY keyword after a column name or as part of the table definition, by using the DISTKEY (column_name) syntax. Either method has the same effect. For more information, see the DISTSTYLE keyword definition.

SORTKEY ( column_name [,...] )

Specifies one or more sort keys for the table. When data is loaded into the table, the data is sorted by one or more columns that are designated as sort keys. You can use the SORTKEY keyword after a column name to specify a single-column sort key, or you can specify one or more columns as sort key columns for the table by using the SORTKEY (column_name [, ...]) syntax.

If you do not specify any sort keys, the table is not sorted by default. You can define a maximum of 400 SORTKEY columns per table.

UNIQUE ( column_name [,...] )

The UNIQUE constraint specifies that a group of one or more columns of a table may contain only unique values. The behavior of the unique table constraint is the same as that for column constraints, with the additional capability to span multiple columns. In the context of unique constraints, null values are not considered equal. Each unique table constraint must name a set of columns that is different from the set of columns named by any other unique or primary key constraint defined for the table.

Important

Unique constraints are informational and are not enforced by the system.

PRIMARY KEY ( column_name [,...] )

The primary key constraint specifies that a column or a number of columns of a table may contain only unique (non-duplicate) non-null values. Identifying a set of columns as the primary key also provides metadata about the design of the schema. A primary key implies that other tables may rely on this set of columns as a unique identifier for rows. One primary key can be specified for a table, whether as a single column constraint or a table constraint. The primary key constraint should name a set of columns that is different from other sets of columns named by any unique constraint defined for the same table.

Important

Primary key constraints are informational only. They are not enforced by the system, but they are used by the planner.

FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ ( refcolumn ) ]

Specifies a foreign key constraint, which requires that a group of one or more columns of the new table must only contain values that match values in the referenced column(s) of some row of the referenced table. If refcolumn is omitted, the primary key of the reftable is used. The referenced columns must be the columns of a unique or primary key constraint in the referenced table.

Important

Foreign key constraints are informational only. They are not enforced by the system, but they are used by the planner.