Using the Ion format in AWS Glue
AWS Glue retrieves data from sources and writes data to targets stored and transported in various data formats. If your data is stored or transported in the Ion data format, this document introduces you available features for using your data in AWS Glue.
AWS Glue supports using the Ion format. This format represents data structures (that aren't row or column based)
in interchangeable binary and plaintext representations. For an introduction to the format by the authors, see
Amazon Ion
You can use AWS Glue to read Ion files from Amazon S3. You can read bzip
and gzip
archives
containing Ion files from S3. You configure compression behavior on the S3 connection parameters
instead of in the configuration discussed on this page.
The following table shows which common AWS Glue operations support the Ion format option.
Read | Write | Streaming read | Group small files | Job bookmarks |
---|---|---|---|---|
Supported | Unsupported | Unsupported | Supported | Unsupported |
Example: Read Ion files and folders from S3
Prerequisites: You will need the S3 paths (s3path
) to the
Ion files or folders that you want to read.
Configuration:
In your function options, specify format="json"
. In your connection_options
, use the
paths
key to specify your s3path
. You can configure how the reader interacts with S3 in the
connection_options
. For details, see Connection types and options for ETL in AWS Glue: Amazon S3 connection option reference.
The following AWS Glue ETL script shows the process of reading Ion files or folders from S3:
Ion configuration reference
There are no format_options
values for format="ion"
.