CREATE TABLE

Creates a table with the name and the parameters that you specify.

Synopsis#

CREATE [EXTERNAL] TABLE [IF NOT EXISTS]
 [db_name.]table_name [(col_name data_type [COMMENT col_comment] [, ...] )]
 [COMMENT table_comment]
 [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
 [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
 [SKEWED BY ( col_name1 [, col_name2 , ... ] ) ON ( ( "col_name1_valueX" [, "col_name2_valueA" , ... ] ) [, ("col_name1_valueY" [, "col_name2_valueB") , ... ] ) ] [STORED AS DIRECTORIES]]
 [ROW FORMAT row_format]
 [STORED AS file_format] [WITH SERDEPROPERTIES (...)] ]
 [LOCATION 's3_loc']
 [TBLPROPERTIES ( ['has_encrypted_data'='true | false',] property_name=property_value [, ...] ) ]

Parameters#

[EXTERNAL]
Specifies that the table is based on an underlying data file that exists in Amazon S3, in the LOCATION that you specify. When you create an external table, the data referenced must comply with the default format or the format that you specify with the ROW FORMAT, STORED AS, and WITH SERDEPROPERTIES clauses.
[IF NOT EXISTS]
Causes the error message to be suppressed if a table named table_name already exists.
[db_name.]table_name
Specifies a name for the table to be created. The optional db_name parameter specifies the database where the table exists. If omitted, the current database is assumed. If the table name includes numbers, enclose table_name in quotation marks, for example "table123". If table_name begins with an underscore, use backticks, for example, `_mytable`.
[ ( col_name data_type [COMMENT col_comment] [, ...] ) ]

Specifies the name for each column to be created, along with the column's data type. If col_name begins with an underscore, enclose the column name in backticks, for example _mycolumn. The data_type value can be any of the following:

  • primitive_type
    • TINYINT
    • SMALLINT
    • INT
    • BIGINT
    • BOOLEAN
    • FLOAT
    • DOUBLE
    • STRING
    • BINARY
    • TIMESTAMP
    • DECIMAL [ (precision,scale) ]
    • DATE
    • VARCHAR
    • CHAR
  • array_type
    • ARRAY < data_type >
  • map_type
    • MAP < primitive_type, data_type >
  • struct_type
    • STRUCT < col_name : data_type [COMMENT col_comment] [, ...] >
  • union_type
    • UNIONTYPE < data_type, data_type [, ...] >
[COMMENT table_comment]
Creates the comment table property and populates it with the table_comment you specify.
[PARTITIONED BY (col_name data_type [ COMMENT col_comment ], ... ) ]

Creates a partitioned table with one or more partition columns that have the col_name, data_type and col_comment specified. A table can have one or more partitions, which consist of a distinct column name and value combination. A separate data directory is created for each specified combination, which can improve query performance in some circumstances. Partitioned columns don't exist within the table data itself. If you use a value for col_name that is the same as a table column, you get an error. For more information, see partitions.

Note

After you create a table with partitions, run a subsequent query that consists of the MSCK REPAIR TABLE clause to refresh partition metadata, for example, MSCK REPAIR TABLE cloudfront_logs;.
[CLUSTERED BY (col_name, col_name2 [, ...] ) [SORTED BY (sort_col_name [ASC | DESC], ... ) ] INTO num_buckets BUCKETS ]
Creates clusters and corresponding buckets, which allows for more efficient sampling in some query cases. Records in col_name are hashed into buckets by num_buckets. This ensures that records in col_name are stored in the same bucket. The optional SORTED BY clause specifies that the cluster and buckets should be created based on the values in the columns you list as sort_col_name in either ascending (ASC) or descending (DESC) order. The column or columns in this clause are part of the table definition as opposed to partitioned columns.
SKEWED BY ( col_name1 [ ,col_name2, ... ] ) ON ( ("col_name1_valueX" [, "col_name2_valueA" , ... ] ) [, ("col_name1_valueY" [, "col_name2_valueB") , ... ] ) ] [ STORED AS DIRECTORIES ]
Specifies skewing parameters for the table. Skewing can help accelerate queries when a few values occur often in a particular column. For example, if the column "country" were predominantly filled with the values "India" and "US", the table and column would be a good candidate for skewing on the "country" column. You can use skewing with or without partitioning. The column or columns specified by col_name1, col_name2, and so on are the columns on which to base the skew, followed by the values that should be skewed from each column respectively. Values must be in quoted strings. Each column can have one or more corresponding values, which must be specified in the same order as the column names. In other words, column_name1 corresponds to col_name1_valueX, col_name1_valueY, and so on. Similarly, column_name2 corresponds to col_name2_valueA, col_name2_valueB and so on. For each column and value specified, records are split into separate files so that queries can skip or include records based on the input values. The optional STORED AS DIRECTORIES clause specifies that list bucketing should be used on skewed tables. List bucketing can't be used with normal bucketing operations (for example, CLUSTERED BY), external tables, or tables created with LOAD DATA.
[ROW FORMAT row_format]

Specifies the row format of the table and its underlying source data if applicable. For row_format, you can specify one or more delimiters with the DELIMITED clause or, alternatively, use the SERDE clause as described below. If ROW FORMAT is omitted or ROW FORMAT DELIMITED is specified, a native SerDe is used.

  • [DELIMITED FIELDS TERMINATED BY char [ESCAPED BY char]]
  • [DELIMITED COLLECTION ITEMS TERMINATED BY char]
  • [MAP KEYS TERMINATED BY char]
  • [LINES TERMINATED BY char]
  • [NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)

--OR--

  • SERDE 'serde_name' [WITH SERDEPROPERTIES ("property_name" = "property_value", "property_name" = "property_value" [, ...] )]

    The serde_name indicates the SerDe to use. The WITH SERDEPROPERTIES clause allows you to provide one or more custom properties allowed by the SerDe.

[STORED AS file_format]

Specifies the file format for table data. If omitted, TEXTFILE is the default. Options for file_format are:

  • SEQUENCEFILE
  • TEXTFILE
  • RCFILE
  • ORC
  • PARQUET
  • AVRO
  • INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
[LOCATION 'S3_loc']
Specifies the location of the underlying data in Amazon S3 from which the table is created, for example, `s3://mystorage`. For more information about considerations such as data format and permissions, see Create Tables From Underlying Data in Amazon S3.
[TBLPROPERTIES ( ['has_encrypted_data'='true | false',] property_name=property_value [, ...] ) ]
Specifies custom metadata key-value pairs for the table definition in addition to predefined table properties, such as "comment". Athena has a built-in property, has_encrypted_data. Set this property to true to indicate that the underlying data set specified by LOCATION is encrypted. If omitted, false is assumed. If omitted or set to false when underlying data is encrypted, the query results in an error. For more information, see encryption.

Examples#

CREATE EXTERNAL TABLE IF NOT EXISTS mydatabase.cloudfront_logs (
  Date DATE,
  Time STRING,
  Location STRING,
  Bytes INT,
  RequestIP STRING,
  Method STRING,
  Host STRING,
  Uri STRING,
  Status INT,
  Referrer STRING,
  os STRING,
  Browser STRING,
  BrowserVersion STRING
      ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
      WITH SERDEPROPERTIES (
      "input.regex" = "^(?!#)([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+[^\(]+[\(]([^\;]+).*\%20([^\/]+)[\/](.*)$"
      ) LOCATION 's3://athena-examples/cloudfront/plaintext/';