Data Protection in Athena - Amazon Athena

Data Protection in Athena

Multiple types of data are involved when you use Athena to create databases and tables. These data types include source data stored in Amazon S3, metadata for databases and tables that you create when you run queries or the AWS Glue Crawler to discover data, query results data, and query history. This section discusses each type of data and provides guidance about protecting it.

  • Source data – You store the data for databases and tables in Amazon S3, and Athena does not modify it. For more information, see Data Protection in Amazon S3 in the Amazon Simple Storage Service Developer Guide. You control access to your source data and can encrypt it in Amazon S3. You can use Athena to create tables based on encrypted datasets in Amazon S3.

  • Database and table metadata (schema) – Athena uses schema-on-read technology, which means that your table definitions are applied to your data in Amazon S3 when Athena runs queries. Any schemas you define are automatically saved unless you explicitly delete them. In Athena, you can modify the Data Catalog metadata using DDL statements. You can also delete table definitions and schema without impacting the underlying data stored in Amazon S3.

    Note

    The metadata for databases and tables you use in Athena is stored in the AWS Glue Data Catalog. We highly recommmend that you upgrade to using the AWS Glue Data Catalog with Athena. For more information about the benefits of using the AWS Glue Data Catalog, see FAQ: Upgrading to the AWS Glue Data Catalog.

    You can define fine-grained access policies to databases and tables registered in the AWS Glue Data Catalog using AWS Identity and Access Management (IAM). You can also encrypt metadata in the AWS Glue Data Catalog. If you encrypt the metadata, use permissions to encrypted metadata for access.

  • Query results and query history, including saved queries – Query results are stored in a location in Amazon S3 that you can choose to specify globally, or for each workgroup. If not specified, Athena uses the default location in each case. You control access to Amazon S3 buckets where you store query results and saved queries. Additionally, you can choose to encrypt query results that you store in Amazon S3. Your users must have the appropriate permissions to access the Amazon S3 locations and decrypt files. For more information, see Encrypting Query Results Stored in Amazon S3 in this document.

    Athena retains query history for 45 days. You can view query history using Athena APIs, in the console, and with AWS CLI. To store the queries for longer than 45 days, save them. To protect access to saved queries, use workgroups in Athena, restricting access to saved queries only to users who are authorized to view them.