Amazon Athena
User Guide  | API Reference

FAQ: Upgrading to the AWS Glue Data Catalog

If you created databases and tables using Athena in a region before AWS Glue was available in that region, metadata is stored in an Athena-managed data catalog, which only Athena and Amazon Redshift Spectrum can access. To use AWS Glue features together with Athena and Redshift Spectrum, you must upgrade to the AWS Glue Data Catalog. Athena can only be used together with the AWS Glue Data Catalog in regions where AWS Glue is available. For a list of regions, see Regions and Endpoints in the AWS General Reference.

Why should I upgrade to the AWS Glue Data Catalog?#

AWS Glue is a completely-managed extract, transform, and load (ETL) service. It has three main components:

  • An AWS Glue crawler can automatically scan your data sources, identify data formats, and infer schema.
  • A fully managed ETL service allows you to transform and move data to various destinations.
  • The AWS Glue Data Catalog stores metadata information about databases and tables, pointing to a data store in Amazon S3 or a JDBC-compliant data store.

For more information, see AWS Glue Concepts.

Upgrading to the AWS Glue Data Catalog has the following benefits.

Unified metadata repository#

The AWS Glue Data Catalog provides a unified metadata repository across a variety of data sources and data formats. It provides out-of-the-box integration with Amazon Simple Storage Service (Amazon S3), Amazon Relational Database Service (Amazon RDS), Amazon Redshift, Amazon Redshift Spectrum, Athena, Amazon EMR, and any application compatible with the Apache Hive metastore. You can create your table definitions one time and query across engines.

For more information, see Populating the AWS Glue Data Catalog.

Automatic schema and partition recognition#

AWS Glue crawlers automatically crawl your data sources, identify data formats, and suggest schema and transformations. Crawlers can help automate table creation and automatic loading of partitions that you can query using Athena, Amazon EMR, and Redshift Spectrum. You can also create tables and partitions directly using the AWS Glue API, SDKs, and the AWS CLI.

For more information, see Cataloging Tables with a Crawler.

Easy-to-build pipelines#

The AWS Glue ETL engine generates Python code that is entirely customizable, reusable, and portable. You can edit the code using your favorite IDE or notebook and share it with others using GitHub. After your ETL job is ready, you can schedule it to run on the fully managed, scale-out Spark infrastructure of AWS Glue. AWS Glue handles provisioning, configuration, and scaling of the resources required to run your ETL jobs, allowing you to tightly integrate ETL with your workflow.

For more information, see Authoring AWS Glue Jobs in the AWS Glue Developer Guide.

Are there separate charges for AWS Glue?#

Yes. With AWS Glue, you pay a monthly rate for storing and accessing the metadata stored in the AWS Glue Data Catalog, an hourly rate billed per second for AWS Glue ETL jobs and crawler runtime, and an hourly rate billed per second for each provisioned development endpoint. The AWS Glue Data Catalog allows you to store up to a million objects at no charge. If you store more than a million objects, you are charged USD$1 for each 100,000 objects over a million. An object in the AWS Glue Data Catalog is a table, a partition, or a database. For more information, see AWS Glue Pricing.

Upgrade process FAQ#

Who can perform the upgrade?#

You need to attach a customer-managed IAM policy with a policy statement that allows the upgrade action to the user who will perform the migration. This extra check prevents someone from accidentally migrating the catalog for the entire account. For more information, see Step 1 - Allow a User to Perform the Upgrade.

My users use a managed policy with Athena and Redshift Spectrum. What steps do I need to take to upgrade?#

The Athena managed policy has been automatically updated with new policy actions that allow Athena users to access AWS Glue. However, you still must explicitly allow the upgrade action for the user who performs the upgrade. To prevent accidental upgrade, the managed policy does not allow this action.

What happens if I don’t upgrade?#

If you don’t upgrade, you are not able to use AWS Glue features together with the databases and tables you create in Athena or vice versa. You can use these services independently. During this time, Athena and AWS Glue both prevent you from creating databases and tables that have the same names in the other data catalog. This prevents name collisions when you do upgrade.

Why do I need to add AWS Glue policies to Athena users?#

Before you upgrade, Athena manages the data catalog, so Athena actions must be allowed for your users to perform queries. After you upgrade to the AWS Glue Data Catalog, Athena actions no longer apply to accessing the AWS Glue Data Catalog, so AWS Glue actions must be allowed for your users. Remember, the managed policy for Athena has already been updated to allow the required AWS Glue actions, so no action is required if you use the managed policy.

What happens if I don’t allow AWS Glue policies for Athena users?#

If you upgrade to the AWS Glue Data Catalog and don't update a user's customer-managed or inline IAM policies, Athena queries fail because the user won't be allowed to perform actions in AWS Glue. For the specific actions to allow, see Step 2 - Update Customer-Managed/Inline Policies Associated with Athena Users.

Is there risk of data loss during the upgrade?#

No.

Is my data also moved during this upgrade?#

No. The migration only affects metadata.