Autogenerating ETL Scripts API - AWS Glue

Autogenerating ETL Scripts API

The ETL script-generation API describes the datatypes and API for generating ETL scripts in AWS Glue.

Data Types

CodeGenNode Structure

Represents a node in a directed acyclic graph (DAG)

Fields

  • IdRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Identifier string pattern.

    A node identifier that is unique within the node's graph.

  • NodeTypeRequired: UTF-8 string.

    The type of node that this is.

  • ArgsRequired: An array of CodeGenNodeArg objects, not more than 50 structures.

    Properties of the node, in the form of name-value pairs.

  • LineNumber – Number (integer).

    The line number of the node.

CodeGenNodeArg Structure

An argument or property of a node.

Fields

  • NameRequired: UTF-8 string.

    The name of the argument or property.

  • ValueRequired: UTF-8 string.

    The value of the argument or property.

  • Param – Boolean.

    True if the value is used as a parameter.

CodeGenEdge Structure

Represents a directional edge in a directed acyclic graph (DAG).

Fields

  • SourceRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Identifier string pattern.

    The ID of the node at which the edge starts.

  • TargetRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Identifier string pattern.

    The ID of the node at which the edge ends.

  • TargetParameter – UTF-8 string.

    The target of the edge.

Location Structure

The location of resources.

Fields

  • Jdbc – An array of CodeGenNodeArg objects, not more than 50 structures.

    A JDBC location.

  • S3 – An array of CodeGenNodeArg objects, not more than 50 structures.

    An Amazon Simple Storage Service (Amazon S3) location.

  • DynamoDB – An array of CodeGenNodeArg objects, not more than 50 structures.

    An Amazon DynamoDB table location.

CatalogEntry Structure

Specifies a table definition in the AWS Glue Data Catalog.

Fields

  • DatabaseNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The database in which the table metadata resides.

  • TableNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    The name of the table in question.

MappingEntry Structure

Defines a mapping.

Fields

  • SourceTable – UTF-8 string.

    The name of the source table.

  • SourcePath – UTF-8 string.

    The source path.

  • SourceType – UTF-8 string.

    The source type.

  • TargetTable – UTF-8 string.

    The target table.

  • TargetPath – UTF-8 string.

    The target path.

  • TargetType – UTF-8 string.

    The target type.

Operations

CreateScript Action (Python: create_script)

Transforms a directed acyclic graph (DAG) into code.

Request

  • DagNodes – An array of CodeGenNode objects.

    A list of the nodes in the DAG.

  • DagEdges – An array of CodeGenEdge objects.

    A list of the edges in the DAG.

  • Language – UTF-8 string (valid values: PYTHON | SCALA).

    The programming language of the resulting code from the DAG.

Response

  • PythonScript – UTF-8 string.

    The Python script generated from the DAG.

  • ScalaCode – UTF-8 string.

    The Scala code generated from the DAG.

Errors

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException

GetDataflowGraph Action (Python: get_dataflow_graph)

Transforms a Python script into a directed acyclic graph (DAG).

Request

  • PythonScript – UTF-8 string.

    The Python script to transform.

Response

  • DagNodes – An array of CodeGenNode objects.

    A list of the nodes in the resulting DAG.

  • DagEdges – An array of CodeGenEdge objects.

    A list of the edges in the resulting DAG.

Errors

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException

GetMapping Action (Python: get_mapping)

Creates mappings.

Request

  • SourceRequired: A CatalogEntry object.

    Specifies the source table.

  • Sinks – An array of CatalogEntry objects.

    A list of target tables.

  • Location – A Location object.

    Parameters for the mapping.

Response

  • MappingRequired: An array of MappingEntry objects.

    A list of mappings to the specified targets.

Errors

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException

  • EntityNotFoundException

GetPlan Action (Python: get_plan)

Gets code to perform a specified mapping.

Request

  • MappingRequired: An array of MappingEntry objects.

    The list of mappings from a source table to target tables.

  • SourceRequired: A CatalogEntry object.

    The source table.

  • Sinks – An array of CatalogEntry objects.

    The target tables.

  • Location – A Location object.

    The parameters for the mapping.

  • Language – UTF-8 string (valid values: PYTHON | SCALA).

    The programming language of the code to perform the mapping.

  • AdditionalPlanOptionsMap – A map array of key-value pairs.

    Each key is a UTF-8 string.

    Each value is a UTF-8 string.

    A map to hold additional optional key-value parameters.

    Currently, these key-value pairs are supported:

    • inferSchema  —  Specifies whether to set inferSchema to true or false for the default script generated by an AWS Glue job. For example, to set inferSchema to true, pass the following key value pair:

      --additional-plan-options-map '{"inferSchema":"true"}'

Response

  • PythonScript – UTF-8 string.

    A Python script to perform the mapping.

  • ScalaCode – UTF-8 string.

    The Scala code to perform the mapping.

Errors

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException