CREATE EXTERNAL SCHEMA - Amazon Redshift

CREATE EXTERNAL SCHEMA

Creates a new external schema in the current database. You can use this external schema to connect to Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition databases. You can also create an external schema that references a database in an external data catalog such as AWS Glue, Athena, or a database in an Apache Hive metastore, such as Amazon EMR.

The owner of this schema is the issuer of the CREATE EXTERNAL SCHEMA command. To transfer ownership of an external schema, use ALTER SCHEMA to change the owner. To grant access to the schema to other users or user groups, use the GRANT command.

You can't use the GRANT or REVOKE commands for permissions on an external table. Instead, grant or revoke the permissions on the external schema.

Note

If you currently have Redshift Spectrum external tables in the Amazon Athena data catalog, you can migrate your Athena data catalog to an AWS Glue Data Catalog. To use the AWS Glue Data Catalog with Redshift Spectrum, you might need to change your AWS Identity and Access Management (IAM) policies. For more information, see Upgrading to the AWS Glue Data Catalog in the Athena User Guide.

To view details for external schemas, query the SVV_EXTERNAL_SCHEMAS system view.

Syntax

The following syntax describes the CREATE EXTERNAL SCHEMA command used to reference data using an external data catalog. For more information, see Querying external data using Amazon Redshift Spectrum.

CREATE EXTERNAL SCHEMA [IF NOT EXISTS] local_schema_name
FROM { [ DATA CATALOG ] | HIVE METASTORE | POSTGRES | MYSQL | KINESIS | MSK | REDSHIFT }
[ DATABASE 'database_name' ]
[ SCHEMA 'schema_name' ]
[ REGION 'aws-region' ]
[ URI 'hive_metastore_uri' [ PORT port_number ] ]
IAM_ROLE { default | 'SESSION' | 'arn:aws:iam::<AWS account-id>:role/<role-name>' }
[ SECRET_ARN 'ssm-secret-arn' ]
[ AUTHENTICATION { none | iam } ]
[ CLUSTER_ARN 'arn:aws:kafka:<region>:<AWS account-id>:cluster/msk/<cluster uuid>' ]
[ CATALOG_ROLE { 'SESSION' | 'catalog-role-arn-string' } ]
[ CREATE EXTERNAL DATABASE IF NOT EXISTS ]
[ CATALOG_ID 'Amazon Web Services account ID containing Glue or Lake Formation database' ]

The following syntax describes the CREATE EXTERNAL SCHEMA command used to reference data using a federated query to RDS POSTGRES or Aurora PostgreSQL. You can also create an external schema that references streaming sources, such as Kinesis Data Streams. For more information, see Querying data with federated queries in Amazon Redshift.

CREATE EXTERNAL SCHEMA [IF NOT EXISTS] local_schema_name
FROM POSTGRES
DATABASE 'federated_database_name' [SCHEMA 'schema_name']
URI 'hostname' [ PORT port_number ]
IAM_ROLE { default | 'arn:aws:iam::<AWS account-id>:role/<role-name>' }
SECRET_ARN 'ssm-secret-arn'

The following syntax describes the CREATE EXTERNAL SCHEMA command used to reference data using a federated query to RDS MySQL or Aurora MySQL. For more information, see Querying data with federated queries in Amazon Redshift.

CREATE EXTERNAL SCHEMA [IF NOT EXISTS] local_schema_name
FROM MYSQL
DATABASE 'federated_database_name'
URI 'hostname' [ PORT port_number ]
IAM_ROLE { default | 'arn:aws:iam::<AWS account-id>:role/<role-name>' }
SECRET_ARN 'ssm-secret-arn'

The following syntax describes the CREATE EXTERNAL SCHEMA command used to reference data in a Kinesis stream. For more information, see Streaming ingestion.

CREATE EXTERNAL SCHEMA [IF NOT EXISTS] schema_name
FROM KINESIS
IAM_ROLE { default | 'arn:aws:iam::<AWS account-id>:role/<role-name>' }

The following syntax describes the CREATE EXTERNAL SCHEMA command used to reference the Amazon Managed Streaming for Apache Kafka cluster and its topics to ingest from. CLUSTER_ARN specifies the Amazon MSK cluster that you’re reading data from. For more information, see Streaming ingestion.

CREATE EXTERNAL SCHEMA [IF NOT EXISTS] schema_name
FROM MSK
IAM_ROLE { default | 'arn:aws:iam::<AWS account-id>:role/<role-name>' }
AUTHENTICATION { none | iam }
CLUSTER_ARN 'msk-cluster-arn';

The following syntax describes the CREATE EXTERNAL SCHEMA command used to reference data using a cross-database query.

CREATE EXTERNAL SCHEMA local_schema_name
FROM  REDSHIFT
DATABASE 'redshift_database_name' SCHEMA 'redshift_schema_name'

Parameters

IF NOT EXISTS

A clause that indicates that if the specified schema already exists, the command should make no changes and return a message that the schema exists, rather than terminating with an error. This clause is useful when scripting, so the script doesn't fail if CREATE EXTERNAL SCHEMA tries to create a schema that already exists.

local_schema_name

The name of the new external schema. For more information about valid names, see Names and identifiers.

FROM [ DATA CATALOG ] | HIVE METASTORE | POSTGRES | MYSQL | KINESIS | MSK | REDSHIFT

A keyword that indicates where the external database is located.

DATA CATALOG indicates that the external database is defined in the Athena data catalog or the AWS Glue Data Catalog.

If the external database is defined in an external Data Catalog in a different AWS Region, the REGION parameter is required. DATA CATALOG is the default.

HIVE METASTORE indicates that the external database is defined in an Apache Hive metastore. If HIVE METASTORE, is specified, URI is required.

POSTGRES indicates that the external database is defined in RDS PostgreSQL or Aurora PostgreSQL.

MYSQL indicates that the external database is defined in RDS MySQL or Aurora MySQL.

KINESIS indicates that the data source is a stream from Kinesis Data Streams.

MSK indicates that the data source is a topic from Amazon MSK.

FROM REDSHIFT

A keyword that indicates that the database is located in Amazon Redshift.

DATABASE 'redshift_database_name' SCHEMA 'redshift_schema_name'

The name of the Amazon Redshift database.

The redshift_schema_name indicates the schema in Amazon Redshift. The default redshift_schema_name is public.

DATABASE 'federated_database_name'

A keyword that indicates the name of the external database in a supported PostgreSQL or MySQL database engine.

[SCHEMA 'schema_name']

The schema_name indicates the schema in a supported PostgreSQL database engine. The default schema_name is public.

You can't specify a SCHEMA when you set up a federated query to a supported MySQL database engine.

REGION 'aws-region'

If the external database is defined in an Athena data catalog or the AWS Glue Data Catalog, the AWS Region in which the database is located. This parameter is required if the database is defined in an external Data Catalog.

URI 'hive_metastore_uri' [ PORT port_number ]

The hostname URI and port_number of a supported PostgreSQL or MySQL database engine. The hostname is the head node of the replica set. The endpoint must be reachable (routable) from the Amazon Redshift cluster. The default PostgreSQL port_number is 5432. The default MySQL port_number is 3306.

If the database is in a Hive metastore, specify the URI and optionally the port number for the metastore. The default port number is 9083.

A URI doesn't contain a protocol specification ("http://"). An example valid URI: uri '172.10.10.10'.

Note

The supported PostgreSQL or MySQL database engine must be in the same VPC as your Amazon Redshift cluster. Create a security group linking Amazon Redshift and RDS PostgreSQL or Aurora PostgreSQL.

IAM_ROLE { default | 'SESSION' | 'arn:aws:iam::<AWS account-id>:role/<role-name>' }

Use the default keyword to have Amazon Redshift use the IAM role that is set as default and associated with the cluster when the CREATE EXTERNAL SCHEMA command runs.

Use 'SESSION' if you connect to your Amazon Redshift cluster using a federated identity and access the tables from the external schema created using this command. For more information, see Using a federated identity to manage Amazon Redshift access to local resources and Amazon Redshift Spectrum external tables, which explains how to configure federated identity. Note that this configuration, using 'SESSION' in place of the ARN, can be used only if the schema is created using DATA CATALOG.

Use the Amazon Resource Name (ARN) for an IAM role that your cluster uses for authentication and authorization. As a minimum, the IAM role must have permission to perform a LIST operation on the Amazon S3 bucket to be accessed and a GET operation on the Amazon S3 objects the bucket contains.

The following shows the syntax for the IAM_ROLE parameter string for a single ARN.

IAM_ROLE 'arn:aws:iam::<aws-account-id>:role/<role-name>'

You can chain roles so that your cluster can assume another IAM role, possibly belonging to another account. You can chain up to 10 roles. For an example of chaining roles, see Chaining IAM roles in Amazon Redshift Spectrum.

To this IAM role, attach an IAM permissions policy similar to the following.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "AccessSecret", "Effect": "Allow", "Action": [ "secretsmanager:GetResourcePolicy", "secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret", "secretsmanager:ListSecretVersionIds" ], "Resource": "arn:aws:secretsmanager:us-west-2:123456789012:secret:my-rds-secret-VNenFy" }, { "Sid": "VisualEditor1", "Effect": "Allow", "Action": [ "secretsmanager:GetRandomPassword", "secretsmanager:ListSecrets" ], "Resource": "*" } ] }

For the steps to create an IAM role to use with federated query, see Creating a secret and an IAM role to use federated queries.

Note

Don't include spaces in the list of chained roles.

The following shows the syntax for chaining three roles.

IAM_ROLE 'arn:aws:iam::<aws-account-id>:role/<role-1-name>,arn:aws:iam::<aws-account-id>:role/<role-2-name>,arn:aws:iam::<aws-account-id>:role/<role-3-name>'
SECRET_ARN 'ssm-secret-arn'

The Amazon Resource Name (ARN) of a supported PostgreSQL or MySQL database engine secret created using AWS Secrets Manager. For information about how to create and retrieve an ARN for a secret, see Creating a Basic Secret and Retrieving the Secret Value Secret in the AWS Secrets Manager User Guide.

CATALOG_ROLE { 'SESSION' | catalog-role-arn-string}

Use 'SESSION' to connect to your Amazon Redshift cluster using a federated identity for authentication and authorization to the data catalog. For more information about completing the steps for federated identity, see Using a federated identity to manage Amazon Redshift access to local resources and Amazon Redshift Spectrum external tables. Note that the 'SESSION' role can be used only if the schema is created in DATA CATALOG.

Use the Amazon Resource Name ARN for an IAM role that your cluster uses for authentication and authorization for the data catalog.

If CATALOG_ROLE isn't specified, Amazon Redshift uses the specified IAM_ROLE. The catalog role must have permission to access the Data Catalog in AWS Glue or Athena. For more information, see IAM policies for Amazon Redshift Spectrum.

The following shows the syntax for the CATALOG_ROLE parameter string for a single ARN.

CATALOG_ROLE 'arn:aws:iam::<aws-account-id>:role/<catalog-role>'

You can chain roles so that your cluster can assume another IAM role, possibly belonging to another account. You can chain up to 10 roles. For more information, see Chaining IAM roles in Amazon Redshift Spectrum.

Note

The list of chained roles must not include spaces.

The following shows the syntax for chaining three roles.

CATALOG_ROLE 'arn:aws:iam::<aws-account-id>:role/<catalog-role-1-name>,arn:aws:iam::<aws-account-id>:role/<catalog-role-2-name>,arn:aws:iam::<aws-account-id>:role/<catalog-role-3-name>'

CREATE EXTERNAL DATABASE IF NOT EXISTS

A clause that creates an external database with the name specified by the DATABASE argument, if the specified external database doesn't exist. If the specified external database exists, the command makes no changes. In this case, the command returns a message that the external database exists, rather than terminating with an error.

Note

You can't use CREATE EXTERNAL DATABASE IF NOT EXISTS with HIVE METASTORE.

To use CREATE EXTERNAL DATABASE IF NOT EXISTS with a Data Catalog enabled for AWS Lake Formation, you need CREATE_DATABASE permission on the Data Catalog.

CATALOG_ID 'Amazon Web Services account ID containing Glue or Lake Formation database'

The account id where the data catalog database is stored.

CATALOG_ID can be specified only if you plan to connect to your Amazon Redshift cluster or to Amazon Redshift Serverless using a federated identity for authentication and authorization to the data catalog by setting either of the following:

  • CATALOG_ROLE to 'SESSION'

  • IAM_ROLE to 'SESSION' and 'CATALOG_ROLE' set to its default

For more information about completing the steps for federated identity, see Using a federated identity to manage Amazon Redshift access to local resources and Amazon Redshift Spectrum external tables.

AUTHENTICATION

The authentication type defined for streaming ingestion. Streaming ingestion with authentication types works with Amazon Managed Streaming for Apache Kafka. The AUTHENTICATION types are the following:

CLUSTER_ARN

For streaming ingestion, the cluster identifier for the Amazon Managed Streaming for Apache Kafka cluster you're streaming from. For more information, see Streaming ingestion.

Usage notes

For limits when using the Athena data catalog, see Athena Limits in the AWS General Reference.

For limits when using the AWS Glue Data Catalog, see AWS Glue Limits in the AWS General Reference.

These limits don’t apply to a Hive metastore.

There is a maximum of 9,900 schemas per database. For more information, see Quotas and limits in the Amazon Redshift Management Guide.

To unregister the schema, use the DROP SCHEMA command.

To view details for external schemas, query the following system views:

Examples

The following example creates an external schema using a database in a data catalog named sampledb in the US West (Oregon) Region. Use this example with an Athena or AWS Glue data catalog.

create external schema spectrum_schema from data catalog database 'sampledb' region 'us-west-2' iam_role 'arn:aws:iam::123456789012:role/MySpectrumRole';

The following example creates an external schema and creates a new external database named spectrum_db.

create external schema spectrum_schema from data catalog database 'spectrum_db' iam_role 'arn:aws:iam::123456789012:role/MySpectrumRole' create external database if not exists;

The following example creates an external schema using a Hive metastore database named hive_db.

create external schema hive_schema from hive metastore database 'hive_db' uri '172.10.10.10' port 99 iam_role 'arn:aws:iam::123456789012:role/MySpectrumRole';

The following example chains roles to use the role myS3Role for accessing Amazon S3 and uses myAthenaRole for data catalog access. For more information, see Chaining IAM roles in Amazon Redshift Spectrum.

create external schema spectrum_schema from data catalog database 'spectrum_db' iam_role 'arn:aws:iam::123456789012:role/myRedshiftRole,arn:aws:iam::123456789012:role/myS3Role' catalog_role 'arn:aws:iam::123456789012:role/myAthenaRole' create external database if not exists;

The following example creates an external schema that references an Aurora PostgreSQL database.

CREATE EXTERNAL SCHEMA [IF NOT EXISTS] myRedshiftSchema FROM POSTGRES DATABASE 'my_aurora_db' SCHEMA 'my_aurora_schema' URI 'endpoint to aurora hostname' PORT 5432 IAM_ROLE 'arn:aws:iam::123456789012:role/MyAuroraRole' SECRET_ARN 'arn:aws:secretsmanager:us-east-2:123456789012:secret:development/MyTestDatabase-AbCdEf'

The following example creates an external schema to refer to the sales_db imported on the consumer cluster.

CREATE EXTERNAL SCHEMA sales_schema FROM REDSHIFT DATABASE 'sales_db' SCHEMA 'public';

The following example creates an external schema that references an Aurora MySQL database.

CREATE EXTERNAL SCHEMA [IF NOT EXISTS] myRedshiftSchema FROM MYSQL DATABASE 'my_aurora_db' URI 'endpoint to aurora hostname' IAM_ROLE 'arn:aws:iam::123456789012:role/MyAuroraRole' SECRET_ARN 'arn:aws:secretsmanager:us-east-2:123456789012:secret:development/MyTestDatabase-AbCdEf'