Using AWS Glue to connect to data sources in Amazon S3
Athena can connect to your data stored in Amazon S3 using the AWS Glue Data Catalog to store metadata such as table and column names. After the connection is made, your databases, tables, and views appear in Athena's query editor.
To define schema information for AWS Glue to use, you can create an AWS Glue crawler to retrieve the information automatically, or you can manually add a table and enter the schema information.
Creating an AWS Glue crawler
You can create a crawler by starting in the Athena console and then using the AWS Glue console in an integrated way. When you create the crawler, you specify a data location in Amazon S3 to crawl.
To create a crawler in AWS Glue starting from the Athena console
Open the Athena console at https://console.aws.amazon.com/athena/
. -
In the query editor, next to Tables and views, choose Create, and then choose AWS Glue crawler.
-
On the AWS Glue console Add crawler page, follow the steps to create a crawler. For more information, see Using AWS Glue Crawlers in this guide and Populating the AWS Glue Data Catalog in the AWS Glue Developer Guide.
Athena does not recognize exclude
patterns that you specify for an AWS Glue crawler. For example, if you have
an Amazon S3 bucket that contains both .csv
and
.json
files and you exclude the .json
files from the crawler, Athena queries both groups of files. To avoid this, place
the files that you want to exclude in a different location.
Adding a table using a form
The following procedure shows you how to use the Athena console to add a table using the Create Table From S3 bucket data form.
To add a table and enter schema information using a form
Open the Athena console at https://console.aws.amazon.com/athena/
. -
In the query editor, next to Tables and views, choose Create, and then choose S3 bucket data.
-
On the Create Table From S3 bucket data form, for Table name, enter a name for the table.
-
For Database configuration, choose an existing database, or create a new one.
-
For Location of Input Data Set, specify the path in Amazon S3 to the folder that contains the dataset that you want to process.
-
For Data Format, choose a data format (Apache Web Logs, CSV, TSV, Text File with Custom Delimiters, JSON, Parquet, or ORC).
-
For the Apache Web Logs option, you must also enter a regex expression in the Regex box.
-
For the Text File with Custom Delimiters option, specify a Field terminator (that is, a column delimiter). Optionally, you can specify a Collection terminator for array types or a Map key terminator.
-
-
For Column details, specify a column name and the column data type.
-
To add more columns one at a time, choose Add a column.
-
To quickly add more columns, choose Bulk add columns. In the text box, enter a comma separated list of columns in the format
column_name
data_type
,column_name
data_type
[, ...], and then choose Add.
-
-
(Optional) For Partition details, add one or more column names and data types.
-
The Preview table query box shows the
CREATE TABLE
statement generated by the information that you entered into the form. The preview statement cannot be edited directly. To change the statement, modify the fields in the form, or create the statement directly in the query editor instead of using the form. -
Choose Create table to run the generated statement in the query editor and create the table.