Menu
Amazon Redshift
Database Developer Guide (API Version 2012-12-01)

CREATE EXTERNAL TABLE

Creates a new external table in the specified schema. All external tables must be created in an external schema. Search path is not supported for external schemas and external tables. For more information, see CREATE EXTERNAL SCHEMA.

Access to external tables is controlled by access to the external schema. You can't GRANT or REVOKE permissions on an external table. Instead, grant or revoke the permissions on the external schema.

In addition to external tables created using the CREATE EXTERNAL TABLE command, Amazon Redshift can reference external tables defined in an Amazon Athena data catalog or a Hive metastore. Use the see CREATE EXTERNAL SCHEMA command to register an external database defined in an Athena data catalog or Hive metastore and make the external tables available for use in Amazon Redshift. If the external table exists in an Athena data catalog or Hive metastore, you don't need to create the table using CREATE EXTERNAL TABLE. To view external tables, query the SVV_EXTERNAL_TABLES system view.

You can query an external table using the same SELECT syntax you use with other Amazon Redshift tables. External tables are read-only. You can't write to an external table.

You can't execute CREATE EXTERNAL TABLE inside a transaction (BEGIN … END).

Syntax

CREATE EXTERNAL TABLE
external_schema.table_name  
(column_name data_type [, …] )
[ PARTITIONED BY (col_name data_type [, … ] )] 
[ ROW FORMAT DELIMITED row_format ]
STORED AS format
LOCATION 's3://bucket/folder/'

Parameters

external_schema.table_name

The name of the table to be created, qualified by an external schema name. External tables must be created in an external schema. For more information, see CREATE EXTERNAL SCHEMA.

The maximum length for the table name is 127 bytes; longer names are truncated to 127 bytes. You can use UTF-8 multibyte characters up to a maximum of four bytes. Amazon Redshift enforces a limit of 9,900 tables per cluster, including user-defined temporary tables and temporary tables created by Amazon Redshift during query processing or system maintenance. Optionally, you can qualify the table name with the database name. In the following example, the database name is spectrum_db , the external schema name is spectrum_schema, and the table name is test.

Copy
create external table spectrum_db.spectrum_schema.test (c1 int) stored as textfile location 's3://mybucket/myfolder/';

If the database or schema specified doesn't exist, the table is not created, and the statement returns an error. You can't create tables or views in the system databases template0, template1, and padb_harvest.

The table name must be a unique name for the specified schema.

For more information about valid names, see Names and Identifiers.

( column_name data_type )

The name and data type of each column being created.

The maximum length for the column name is 127 bytes; longer names are truncated to 127 bytes. You can use UTF-8 multibyte characters up to a maximum of four bytes. The maximum number of columns you can define in a single table is 1,600. For more information about valid names, see Names and Identifiers.

Note

If you are creating a "wide table," take care that your list of columns doesn't exceed row-width boundaries for intermediate results during loads and query processing. For more information, see Usage Notes.

data_type

The following data types are supported:

  • SMALLINT (INT2)

  • INTEGER (INT, INT4)

  • BIGINT (INT8)

  • DECIMAL (NUMERIC)

  • REAL (FLOAT4)

  • DOUBLE PRECISION (FLOAT8)

  • BOOLEAN (BOOL)

  • CHAR (CHARACTER)

  • VARCHAR (CHARACTER VARYING)

  • DATE (DATE data type can be used only as a partition column)

  • TIMESTAMP

Timestamp values in text files must be in the format yyyy-MM-dd HH:mm:ss.SSSSSS, as the following timestamp value shows: 2017-05-01 11:30:59.000000 .

PARTITIONED BY (col_name data_type [, … ] )

A clause that defines a partitioned table with one or more partition columns. A separate data directory is used for each specified combination, which can improve query performance in some circumstances. Partitioned columns don't exist within the table data itself. If you use a value for col_name that is the same as a table column, you get an error.

After creating a partitioned table, alter the table using an ALTER TABLE … ADD PARTITION statement to add partitions. When you add a partition, you define the location of the subfolder on Amazon S3 that contains the partition data. You can add only one partition in each ALTER TABLE statement.

For example, if the table spectrum.lineitem_part is defined with PARTITIONED BY (l_shipdate date), execute the following ALTER TABLE command to add a partition.

Copy
ALTER TABLE spectrum.lineitem_part ADD PARTITION (l_shipdate='1992-01-29') LOCATION 's3://spectrum-public/lineitem_partition/l_shipdate=1992-01-29';

To view partitions, query the SVV_EXTERNAL_PARTITIONS system view.

ROW FORMAT DELIMITED rowformat

A clause that specifies the format of the underlying data. Possible values for rowformat are as follows:

  • FIELDS TERMINATED BY 'delimiter'

  • LINES TERMINATED BY 'delimiter'

If ROW FORMAT DELIMITED is omitted, the default format is fields terminated by '\A' and lines terminated by '\n'.

STORED AS format

The file format for data files.

Valid formats are as follows:

  • PARQUET

  • TEXTFILE

  • SEQUENCEFILE

  • RCFILE (for data using ColumnarSerDe only, not LazyBinaryColumnarSerDe)

LOCATION 's3://bucket/folder/'

The path to the Amazon S3 folder that contains the data files—for example, 's3://mybucket/custdata/'. The bucket must be located in the same region as the cluster. Supported regions are us-east-1, us-east-2, and us-west-2.

Redshift Spectrum scans the files in the specified folder and any subfolders. Redshift Spectrum ignores hidden files and files that begin with a period, underscore, or hash mark ( . , _, or #) or end with a tilde (~).

Usage Notes

You can't view details for Amazon Redshift Spectrum tables using the same resources you use for standard Amazon Redshift tables, such as PG_TABLE_DEF, STV_TBL_PERM, PG_CLASS, or information_schema. If your business intelligence or analytics tool doesn't recognize Redshift Spectrum external tables, configure your application to query SVV_EXTERNAL_TABLES and SVV_EXTERNAL_COLUMNS.

Examples

The following example creates a table named SALES in the Amazon Redshift external schema named spectrum. The data is in tab-delimited text files.

Copy
create external table spectrum.sales( salesid integer, listid integer, sellerid integer, buyerid integer, eventid integer, dateid smallint, qtysold smallint, pricepaid decimal(8,2), commission decimal(8,2), saletime timestamp) row format delimited fields terminated by '\t' stored as textfile location 's3://awssampledbuswest2/tickit/spectrum/sales/';

To create an external table partitioned by date, run the following command.

Copy
create external table spectrum.sales_part( salesid integer, listid integer, sellerid integer, buyerid integer, eventid integer, dateid smallint, qtysold smallint, pricepaid decimal(8,2), commission decimal(8,2), saletime timestamp) partitioned by (saledate date) row format delimited fields terminated by '|' stored as textfile location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/';

To add the partitions, run the following ALTER TABLE commands.

Copy
alter table spectrum.sales_part add partition(saledate='2008-01-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-01/'; alter table spectrum.sales_part add partition(saledate='2008-02-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-02/'; alter table spectrum.sales_part add partition(saledate='2008-03-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-03/'; alter table spectrum.sales_part add partition(saledate='2008-04-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-04/'; alter table spectrum.sales_part add partition(saledate='2008-05-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-05/'; alter table spectrum.sales_part add partition(saledate='2008-06-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-06/'; alter table spectrum.sales_part add partition(saledate='2008-07-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-07/'; alter table spectrum.sales_part add partition(saledate='2008-08-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-08/'; alter table spectrum.sales_part add partition(saledate='2008-09-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-09/'; alter table spectrum.sales_part add partition(saledate='2008-10-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-10/'; alter table spectrum.sales_part add partition(saledate='2008-11-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-11/'; alter table spectrum.sales_part add partition(saledate='2008-12-01') location 's3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-12/';

Run the following query to select data from the partitioned table.

Copy
select top 10 spectrum.sales_part.eventid, sum(spectrum.sales_part.pricepaid) from spectrum.sales_part, event where spectrum.sales_part.eventid = event.eventid and spectrum.sales_part.pricepaid > 30 and saledate = '2008-12-01' group by spectrum.sales_part.eventid order by 2 desc;
eventid | sum     
--------+---------
    914 | 36173.00
   5478 | 27303.00
   5061 | 26383.00
   4406 | 26252.00
   5324 | 24015.00
   1829 | 23911.00
   3601 | 23616.00
   3665 | 23214.00
   6069 | 22869.00
   5638 | 22551.00

To view external table partitions, query the SVV_EXTERNAL_PARTITIONS system view.

Copy
select schemaname, tablename, values, location from svv_external_partitions where tablename = 'sales_part';
Copy
schemaname | tablename | values | location -----------+------------+----------------+-------------------------------------------------- spectrum | sales_part | ["2008-01-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-01 spectrum | sales_part | ["2008-02-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-02 spectrum | sales_part | ["2008-03-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-03 spectrum | sales_part | ["2008-04-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-04 spectrum | sales_part | ["2008-05-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-05 spectrum | sales_part | ["2008-06-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-06 spectrum | sales_part | ["2008-07-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-07 spectrum | sales_part | ["2008-08-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-08 spectrum | sales_part | ["2008-09-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-09 spectrum | sales_part | ["2008-10-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-10 spectrum | sales_part | ["2008-11-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-11 spectrum | sales_part | ["2008-12-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-12

Query the SVV_EXTERNAL_DATABASES system view for a list of existing databases in the external data catalog

Copy
select eskind,databasename,esoptions from svv_external_databases order by databasename;
eskind | databasename | esoptions                                                                        
-------+--------------+----------------------------------------------------------------------------------
     1 | default      | {"REGION":"us-west-2","IAM_ROLE":"arn:aws:iam::123456789012:role/mySpectrumRole"}
     1 | sampledb     | {"REGION":"us-west-2","IAM_ROLE":"arn:aws:iam::123456789012:role/mySpectrumRole"}
     1 | spectrumdb   | {"REGION":"us-west-2","IAM_ROLE":"arn:aws:iam::123456789012:role/mySpectrumRole"}

To view details of external tables, query the SVV_EXTERNAL_TABLES and SVV_EXTERNAL_COLUMNS system views.

The following example queries the SVV_EXTERNAL_TABLES view.

Copy
select schemaname, tablename, location from svv_external_tables;
schemaname | tablename            | location                                               
-----------+----------------------+--------------------------------------------------------
spectrum   | sales                | s3://awssampledbuswest2/tickit/spectrum/sales          
spectrum   | sales_part           | s3://awssampledbuswest2/tickit/spectrum/sales_partition
spectrum   | sales_part_timestamp | s3://rs-spectrum/sales_partition

The following example queries the SVV_EXTERNAL_COLUMNS view.

Copy
select * from svv_external_columns where schemaname like 'rsspectrum%' and tablename ='listing';
	 schemaname | tablename |   columnname   | external_type | part_key
	------------+-----------+----------------+---------------+----------
	 rsspectrum | listing   | dateid         | smallint      |        0
	 rsspectrum | listing   | eventid        | int           |        0
	 rsspectrum | listing   | listid         | int           |        0
	 rsspectrum | listing   | listtime       | timestamp     |        0
	 rsspectrum | listing   | numtickets     | smallint      |        0
	 rsspectrum | listing   | priceperticket | decimal(8,2)  |        0
	 rsspectrum | listing   | sellerid       | int           |        0
	 rsspectrum | listing   | totalprice     | decimal(8,2)  |        0
	(8 rows)

Use the following query to view table partitions.

Copy
select schemaname, tablename, values, location from svv_external_partitions where tablename = 'sales_part';
Copy
schemaname | tablename | values | location -----------+------------+----------------+------------------------------------------------------------------------- spectrum | sales_part | ["2008-01-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-01 spectrum | sales_part | ["2008-02-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-02 spectrum | sales_part | ["2008-03-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-03 spectrum | sales_part | ["2008-04-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-04 spectrum | sales_part | ["2008-05-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-05 spectrum | sales_part | ["2008-06-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-06 spectrum | sales_part | ["2008-07-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-07 spectrum | sales_part | ["2008-08-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-08 spectrum | sales_part | ["2008-09-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-09 spectrum | sales_part | ["2008-10-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-10 spectrum | sales_part | ["2008-11-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-11 spectrum | sales_part | ["2008-12-01"] | s3://awssampledbuswest2/tickit/spectrum/sales_partition/saledate=2008-12