AWS Glue Construct Library

--- ![Stability: Experimental](https://img.shields.io/badge/stability-Experimental-important.svg?style=for-the-badge) > **This is a *developer preview* (public beta) module.** > > All classes with the `Cfn` prefix in this module ([CFN Resources](https://docs.aws.amazon.com/cdk/latest/guide/constructs.html#constructs_lib)) > are auto-generated from CloudFormation. They are stable and safe to use. > > However, all other classes, i.e., higher level constructs, are under active development and subject to non-backward > compatible changes or removal in any future version. These are not subject to the [Semantic Versioning](https://semver.org/) model. > This means that while you may use them, you may need to update your source code when upgrading to a newer version of this package. ---

This module is part of the AWS Cloud Development Kit project.

Database

A Database is a logical grouping of Tables in the Glue Catalog.

# Example automatically generated without compilation. See https://github.com/aws/jsii/issues/826
glue.Database(stack, "MyDatabase",
    database_name="my_database"
)

Table

A Glue table describes a table of data in S3: its structure (column names and types), location of data (S3 objects with a common prefix in a S3 bucket), and format for the files (Json, Avro, Parquet, etc.):

# Example automatically generated without compilation. See https://github.com/aws/jsii/issues/826
glue.Table(stack, "MyTable",
    database=my_database,
    table_name="my_table",
    columns=[{
        "name": "col1",
        "type": glue.Schema.string
    }, {
        "name": "col2",
        "type": glue.Schema.array(Schema.string),
        "comment": "col2 is an array of strings"
    }],
    data_format=glue.DataFormat.Json
)

By default, a S3 bucket will be created to store the table’s data but you can manually pass the bucket and s3Prefix:

# Example automatically generated without compilation. See https://github.com/aws/jsii/issues/826
glue.Table(stack, "MyTable",
    bucket=my_bucket,
    s3_prefix="my-table/", ...
)

Partitions

To improve query performance, a table can specify partitionKeys on which data is stored and queried separately. For example, you might partition a table by year and month to optimize queries based on a time window:

# Example automatically generated without compilation. See https://github.com/aws/jsii/issues/826
glue.Table(stack, "MyTable",
    database=my_database,
    table_name="my_table",
    columns=[{
        "name": "col1",
        "type": glue.Schema.string
    }],
    partition_keys=[{
        "name": "year",
        "type": glue.Schema.smallint
    }, {
        "name": "month",
        "type": glue.Schema.smallint
    }],
    data_format=glue.DataFormat.Json
)

Encryption

You can enable encryption on a Table’s data:

  • Unencrypted - files are not encrypted. The default encryption setting.

  • S3Managed - Server side encryption (SSE-S3) with an Amazon S3-managed key.

# Example automatically generated without compilation. See https://github.com/aws/jsii/issues/826
glue.Table(stack, "MyTable",
    encryption=glue.TableEncryption.S3Managed, ...
)
  • Kms - Server-side encryption (SSE-KMS) with an AWS KMS Key managed by the account owner.

# Example automatically generated without compilation. See https://github.com/aws/jsii/issues/826
# KMS key is created automatically
glue.Table(stack, "MyTable",
    encryption=glue.TableEncryption.Kms, ...
)

# with an explicit KMS key
glue.Table(stack, "MyTable",
    encryption=glue.TableEncryption.Kms,
    encryption_key=kms.Key(stack, "MyKey"), ...
)
  • KmsManaged - Server-side encryption (SSE-KMS), like Kms, except with an AWS KMS Key managed by the AWS Key Management Service.

# Example automatically generated without compilation. See https://github.com/aws/jsii/issues/826
glue.Table(stack, "MyTable",
    encryption=glue.TableEncryption.KmsManaged, ...
)
  • ClientSideKms - Client-side encryption (CSE-KMS) with an AWS KMS Key managed by the account owner.

# Example automatically generated without compilation. See https://github.com/aws/jsii/issues/826
# KMS key is created automatically
glue.Table(stack, "MyTable",
    encryption=glue.TableEncryption.ClientSideKms, ...
)

# with an explicit KMS key
glue.Table(stack, "MyTable",
    encryption=glue.TableEncryption.ClientSideKms,
    encryption_key=kms.Key(stack, "MyKey"), ...
)

Note: you cannot provide a ``Bucket`` when creating the ``Table`` if you wish to use server-side encryption (``Kms``, ``KmsManaged`` or ``S3Managed``).

Types

A table’s schema is a collection of columns, each of which have a name and a type. Types are recursive structures, consisting of primitive and complex types:

# Example automatically generated without compilation. See https://github.com/aws/jsii/issues/826
glue.Table(stack, "MyTable",
    columns=[{
        "name": "primitive_column",
        "type": glue.Schema.string
    }, {
        "name": "array_column",
        "type": glue.Schema.array(glue.Schema.integer),
        "comment": "array<integer>"
    }, {
        "name": "map_column",
        "type": glue.Schema.map(glue.Schema.string, glue.Schema.timestamp),
        "comment": "map<string,string>"
    }, {
        "name": "struct_column",
        "type": glue.Schema.struct([
            name="nested_column",
            type=glue.Schema.date,
            comment="nested comment"
        ]),
        "comment": "struct<nested_column:date COMMENT 'nested comment'>"
    }], ...
)

Primitive

Numeric:

  • bigint

  • float

  • integer

  • smallint

  • tinyint

Date and Time:

  • date

  • timestamp

String Types:

  • string

  • decimal

  • char

  • varchar

Misc:

  • boolean

  • binary

Complex

  • array - array of some other type

  • map - map of some primitive key type to any value type.

  • struct - nested structure containing individually named and typed columns.