CREATE EXTERNAL SCHEMA - Amazon Redshift

CREATE EXTERNAL SCHEMA

Creates a new external schema in the current database. You can use this external schema to connect to Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition databases. You can also create an external schema that references a database in an external data catalog such as AWS Glue, Athena, or a database in an Apache Hive metastore, such as Amazon EMR.

The owner of this schema is the issuer of the CREATE EXTERNAL SCHEMA command. To transfer ownership of an external schema, use ALTER SCHEMA to change the owner. To grant access to the schema to other users or user groups, use the GRANT command.

You can't use the GRANT or REVOKE commands for permissions on an external table. Instead, grant or revoke the permissions on the external schema.

Note

If you currently have Redshift Spectrum external tables in the Amazon Athena data catalog, you can migrate your Athena data catalog to an AWS Glue Data Catalog. To use the AWS Glue Data Catalog with Redshift Spectrum, you might need to change your AWS Identity and Access Management (IAM) policies. For more information, see Upgrading to the AWS Glue Data Catalog in the Athena User Guide.

To view details for external schemas, query the SVV_EXTERNAL_SCHEMAS system view.

Syntax

The following syntax describes the CREATE EXTERNAL SCHEMA command used to reference data using an external data catalog. For more information, see Querying external data using Amazon Redshift Spectrum.

CREATE EXTERNAL SCHEMA [IF NOT EXISTS] local_schema_name
FROM { [ DATA CATALOG ] | HIVE METASTORE | POSTGRES | MYSQL }
DATABASE 'database_name'
[ REGION 'aws-region' ]
[ URI 'hive_metastore_uri' [ PORT port_number ] ]
IAM_ROLE 'iam-role-arn-string'
SECRET_ARN 'ssm-secret-arn'             
[ CATALOG_ROLE 'catalog-role-arn-string' ] 
[ CREATE EXTERNAL DATABASE IF NOT EXISTS ]

The following syntax describes the CREATE EXTERNAL SCHEMA command used to reference data using a federated query to RDS POSTGRES or Aurora PostgreSQL. For more information, see Querying data with federated queries in Amazon Redshift.

CREATE EXTERNAL SCHEMA [IF NOT EXISTS] local_schema_name
FROM POSTGRES
DATABASE 'federated_database_name' [SCHEMA 'schema_name']
URI 'hostname' [ PORT port_number ] 
IAM_ROLE 'iam-role-arn-string'
SECRET_ARN 'ssm-secret-arn'             
The following is prerelease documentation for the federated query to MySQL feature for Amazon Redshift, which is in preview release. The documentation and the feature are both subject to change. We recommend that you use this feature only with test clusters, and not in production environments. For preview terms and conditions, see Beta Service Participation in AWS Service Terms.

The following syntax describes the CREATE EXTERNAL SCHEMA command used to reference data using a federated query to RDS MySQL or Aurora MySQL. For more information, see Querying data with federated queries in Amazon Redshift.

CREATE EXTERNAL SCHEMA [IF NOT EXISTS] local_schema_name
FROM MYSQL
DATABASE 'federated_database_name' 
URI 'hostname' [ PORT port_number ]
IAM_ROLE 'iam-role-arn-string'
SECRET_ARN 'ssm-secret-arn'             

The following syntax describes the CREATE EXTERNAL SCHEMA command used to reference data using a cross-database query.

CREATE EXTERNAL SCHEMA local_schema_name
FROM  REDSHIFT
DATABASE 'redshift_database_name' SCHEMA 'redshift_schema_name'

Parameters

IF NOT EXISTS

A clause that indicates that if the specified schema already exists, the command should make no changes and return a message that the schema exists, rather than terminating with an error. This clause is useful when scripting, so the script doesn't fail if CREATE EXTERNAL SCHEMA tries to create a schema that already exists.

local_schema_name

The name of the new external schema. For more information about valid names, see Names and identifiers.

FROM [ DATA CATALOG ] | HIVE METASTORE

A keyword that indicates where the external database is located.

DATA CATALOG indicates that the external database is defined in the Athena data catalog or the AWS Glue Data Catalog.

If the external database is defined in an external Data Catalog in a different AWS Region, the REGION parameter is required. DATA CATALOG is the default.

HIVE METASTORE indicates that the external database is defined in an Apache Hive metastore. If HIVE METASTORE, is specified, URI is required.

POSTGRES indicates that the external database is defined in RDS PostgreSQL or Aurora PostgreSQL.

(preview) MYSQL indicates that the external database is defined in RDS MySQL or Aurora MySQL.

FROM REDSHIFT

A keyword that indicates that the database is located in Amazon Redshift.

DATABASE 'redshift_database_name' SCHEMA 'redshift_schema_name'

The name of the Amazon Redshift database.

The redshift_schema_name indicates the schema in Amazon Redshift. The default redshift_schema_name is public.

DATABASE 'federated_database_name'

A keyword that indicates the name of the external database in a supported PostgreSQL or MySQL database engine.

[SCHEMA 'schema_name']

The schema_name indicates the schema in a supported PostgreSQL database engine. The default schema_name is public.

You can't specify a SCHEMA when you set up a federated query to a supported MySQL database engine.

REGION 'aws-region'

If the external database is defined in an Athena data catalog or the AWS Glue Data Catalog, the AWS Region in which the database is located. This parameter is required if the database is defined in an external Data Catalog.

URI 'hive_metastore_uri' [ PORT port_number ]

The hostname URI and port_number of a supported PostgreSQL or MySQL database engine. The hostname is the head node of the replica set. The endpoint must be reachable (routable) from the Amazon Redshift cluster. The default port_number is 5432.

If the database is in a Hive metastore, specify the URI and optionally the port number for the metastore. The default port number is 9083.

A URI doesn't contain a protocol specification ("http://"). An example valid URI: uri '172.10.10.10'.

Note

The supported PostgreSQL or MySQL database engine must be in the same VPC as your Amazon Redshift cluster. Create a security group linking Amazon Redshift and RDS PostgreSQL or Aurora PostgreSQL.

IAM_ROLE 'iam-role-arn-string'

The Amazon Resource Name (ARN) for an IAM role that your cluster uses for authentication and authorization. As a minimum, the IAM role must have permission to perform a LIST operation on the Amazon S3 bucket to be accessed and a GET operation on the Amazon S3 objects the bucket contains. If the external database is defined in an Amazon Athena data catalog or the AWS Glue Data Catalog, the IAM role must have permission to access Athena unless CATALOG_ROLE is specified. For more information, see IAM policies for Amazon Redshift Spectrum. The following shows the syntax for the IAM_ROLE parameter string for a single ARN.

IAM_ROLE 'arn:aws:iam::<aws-account-id>:role/<role-name>'

You can chain roles so that your cluster can assume another IAM role, possibly belonging to another account. You can chain up to 10 roles. For more information, see Chaining IAM roles in Amazon Redshift Spectrum.

To this IAM role, attach an IAM permissions policy similar to the following.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "AccessSecret", "Effect": "Allow", "Action": [ "secretsmanager:GetResourcePolicy", "secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret", "secretsmanager:ListSecretVersionIds" ], "Resource": "arn:aws:secretsmanager:us-west-2:123456789012:secret:my-rds-secret-VNenFy" }, { "Sid": "VisualEditor1", "Effect": "Allow", "Action": [ "secretsmanager:GetRandomPassword", "secretsmanager:ListSecrets" ], "Resource": "*" } ] }

For the steps to create an IAM role to use with federated query, see Creating a secret and an IAM role to use federated queries.

Note

Don't include spaces in the list of chained roles.

The following shows the syntax for chaining three roles.

IAM_ROLE 'arn:aws:iam::<aws-account-id>:role/<role-1-name>,arn:aws:iam::<aws-account-id>:role/<role-2-name>,arn:aws:iam::<aws-account-id>:role/<role-3-name>'
SECRET_ARN 'ssm-secret-arn'

The Amazon Resource Name (ARN) of a supported PostgreSQL or MySQL database engine secret created using AWS Secrets Manager. For information about how to create and retrieve an ARN for a secret, see Creating a Basic Secret and Retrieving the Secret Value Secret in the AWS Secrets Manager User Guide.

CATALOG_ROLE 'catalog-role-arn-string'

The ARN for an IAM role that your cluster uses for authentication and authorization for the data catalog. If CATALOG_ROLE isn't specified, Amazon Redshift uses the specified IAM_ROLE. The catalog role must have permission to access the Data Catalog in AWS Glue or Athena. For more information, see IAM policies for Amazon Redshift Spectrum. The following shows the syntax for the CATALOG_ROLE parameter string for a single ARN.

CATALOG_ROLE 'arn:aws:iam::<aws-account-id>:role/<catalog-role>'

You can chain roles so that your cluster can assume another IAM role, possibly belonging to another account. You can chain up to 10 roles. For more information, see Chaining IAM roles in Amazon Redshift Spectrum.

Note

The list of chained roles must not include spaces.

The following shows the syntax for chaining three roles.

CATALOG_ROLE 'arn:aws:iam::<aws-account-id>:role/<catalog-role-1-name>,arn:aws:iam::<aws-account-id>:role/<catalog-role-2-name>,arn:aws:iam::<aws-account-id>:role/<catalog-role-3-name>'
CREATE EXTERNAL DATABASE IF NOT EXISTS

A clause that creates an external database with the name specified by the DATABASE argument, if the specified external database doesn't exist. If the specified external database exists, the command makes no changes. In this case, the command returns a message that the external database exists, rather than terminating with an error.

Note

You can't use CREATE EXTERNAL DATABASE IF NOT EXISTS with HIVE METASTORE.

To use CREATE EXTERNAL DATABASE IF NOT EXISTS with a Data Catalog enabled for AWS Lake Formation, you need CREATE_DATABASE permission on the Data Catalog.

Usage notes

For limits when using the Athena data catalog, see Athena Limits in the AWS General Reference.

For limits when using the AWS Glue Data Catalog, see AWS Glue Limits in the AWS General Reference.

These limits don’t apply to a Hive metastore.

To unregister the schema, use the DROP SCHEMA command.

To view details for external schemas, query the following system views:

Examples

The following example creates an external schema using a database in an Athena data catalog named sampledb in the US West (Oregon) Region.

create external schema spectrum_schema from data catalog database 'sampledb' region 'us-west-2' iam_role 'arn:aws:iam::123456789012:role/MySpectrumRole';

The following example creates an external schema and creates a new external database named spectrum_db.

create external schema spectrum_schema from data catalog database 'spectrum_db' iam_role 'arn:aws:iam::123456789012:role/MySpectrumRole' create external database if not exists;

The following example creates an external schema using a Hive metastore database named hive_db.

create external schema hive_schema from hive metastore database 'hive_db' uri '172.10.10.10' port 99 iam_role 'arn:aws:iam::123456789012:role/MySpectrumRole';

The following example chains roles to use the role myS3Role for accessing Amazon S3 and uses myAthenaRole for data catalog access. For more information, see Chaining IAM roles in Amazon Redshift Spectrum.

create external schema spectrum_schema from data catalog database 'spectrum_db' iam_role 'arn:aws:iam::123456789012:role/myRedshiftRole,arn:aws:iam::123456789012:role/myS3Role' catalog_role 'arn:aws:iam::123456789012:role/myAthenaRole' create external database if not exists;

The following example creates an external schema that references an Aurora PostgreSQL database.

CREATE EXTERNAL SCHEMA [IF NOT EXISTS] myRedshiftSchema FROM POSTGRES DATABASE 'my_aurora_db' SCHEMA 'my_aurora_schema' URI 'endpoint to aurora hostname' PORT 5432 IAM_ROLE 'arn:aws:iam::123456789012:role/MyAuroraRole' SECRET_ARN 'arn:aws:secretsmanager:us-east-2:123456789012:secret:development/MyTestDatabase-AbCdEf'

The following example creates an external schema to refer to the Sales_db imported on the consumer cluster.

CREATE EXTERNAL SCHEMA Sales_schema FROM REDSHIFT DATABASE 'Sales_db' SCHEMA 'public';

The following example creates an external schema that references an Aurora MySQL database.

CREATE EXTERNAL SCHEMA [IF NOT EXISTS] myRedshiftSchema FROM MYSQL DATABASE 'my_aurora_db' URI 'endpoint to aurora hostname' IAM_ROLE 'arn:aws:iam::123456789012:role/MyAuroraRole' SECRET_ARN 'arn:aws:secretsmanager:us-east-2:123456789012:secret:development/MyTestDatabase-AbCdEf'