Setting up networking for development for AWS Glue - AWS Glue

Setting up networking for development for AWS Glue

To run your extract, transform, and load (ETL) scripts with AWS Glue, you can develop and test your scripts using a development endpoint. Development endpoints are not supported for use with AWS Glue version 2.0 jobs. For versions 2.0 and later, the preferred development method is using Jupyter Notebook with one of the AWS Glue kernels. For more information, see Getting started with AWS Glue interactive sessions.

Setting up your network for a development endpoint

When you set up a development endpoint, you specify a virtual private cloud (VPC), subnet, and security groups.

Note

Make sure you set up your DNS environment for AWS Glue. For more information, see Setting up DNS in your VPC.

To enable AWS Glue to access required resources, add a row in your subnet route table to associate a prefix list for Amazon S3 to the VPC endpoint. A prefix list ID is required for creating an outbound security group rule that allows traffic from a VPC to access an AWS service through a VPC endpoint. To ease connecting to a notebook server that is associated with this development endpoint, from your local machine, add a row to the route table to add an internet gateway ID. For more information, see VPC Endpoints. Update the subnet routes table to be similar to the following table:

Destination Target

10.0.0.0/16

local

pl-id for Amazon S3

vpce-id

0.0.0.0/0

igw-xxxx

To enable AWS Glue to communicate between its components, specify a security group with a self-referencing inbound rule for all TCP ports. By creating a self-referencing rule, you can restrict the source to the same security group in the VPC, and it's not open to all networks. The default security group for your VPC might already have a self-referencing inbound rule for ALL Traffic.

To set up a security group
  1. Sign in to the AWS Management Console and open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

  2. In the left navigation pane, choose Security Groups.

  3. Either choose an existing security group from the list, or Create Security Group to use with the development endpoint.

  4. In the security group pane, navigate to the Inbound tab.

  5. Add a self-referencing rule to allow AWS Glue components to communicate. Specifically, add or confirm that there is a rule of Type All TCP, Protocol is TCP, Port Range includes all ports, and whose Source is the same security group name as the Group ID.

    The inbound rule looks similar to this:

    Type Protocol Port range Source

    All TCP

    TCP

    0–65535

    security-group

    The following shows an example of a self-referencing inbound rule:

    
                        Image showing an example of a self-referencing inbound
                            rule.
  6. Add a rule to for outbound traffic also. Either open outbound traffic to all ports, or create a self-referencing rule of Type All TCP, Protocol is TCP, Port Range includes all ports, and whose Source is the same security group name as the Group ID.

    The outbound rule looks similar to one of these rules:

    Type Protocol Port range Destination

    All TCP

    TCP

    0–65535

    security-group

    All Traffic

    ALL

    ALL

    0.0.0.0/0

Setting up Amazon EC2 for a notebook server

With a development endpoint, you can create a notebook server to test your ETL scripts with Jupyter notebooks. To enable communication to your notebook, specify a security group with inbound rules for both HTTPS (port 443) and SSH (port 22). Ensure that the rule's source is either 0.0.0.0/0 or the IP address of the machine that is connecting to the notebook.

To set up a security group
  1. Sign in to the AWS Management Console and open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

  2. In the left navigation pane, choose Security Groups.

  3. Either choose an existing security group from the list, or Create Security Group to use with your notebook server. The security group that is associated with your development endpoint is also used to create your notebook server.

  4. In the security group pane, navigate to the Inbound tab.

  5. Add inbound rules similar to this:

    Type Protocol Port range Source

    SSH

    TCP

    22

    0.0.0.0/0

    HTTPS

    TCP

    443

    0.0.0.0/0

    The following shows an example of the inbound rules for the security group:

    
                        Image showing an example of the inbound rules for the security group.