Menu
AWS Glue
Developer Guide

Working with Development Endpoints on the AWS Glue Console

A development endpoint is an environment that you can use to develop and test your AWS Glue scripts. The Dev endpoints tab on the AWS Glue console lists all the development endpoints that you have created. You can add, delete, or rotate the SSH key of a development endpoint. You can also create notebooks that use the development endpoint.

To display details for a development endpoint, choose the endpoint in the list. Endpoint details include the information you defined when you created it using the Add endpoint wizard. They also include information that you need to connect to the endpoint and any notebooks that use the endpoint.

The following are some of the development endpoint properties:

Endpoint name

The unique name that you give the endpoint when you create it.

Provisioning status

Describes whether the endpoint is being created (PROVISIONING), ready to be used (READY), in the process of terminating (UNHEALTHY_TERMINATING), terminated (UNHEALTHY_TERMINATED), failed (FAILED), or being updated (UPDATING).

Adding an Endpoint

To add an endpoint, sign in to the AWS Management Console and open the AWS Glue console at https://console.aws.amazon.com/glue/. Choose the Dev endpoints tab, and then choose Add endpoint.

Follow the steps in the AWS Glue Add endpoint wizard to provide the properties that are required to create an endpoint.

You provide values for the following properties:

Name

The name of your development endpoint.

IAM role

The role your development endpoint assumes to run AWS Glue work.

Python library path

You can add Python libraries that are required by your ETL scripts.

Dependent jars path

You can add dependent jars that are required by your ETL scripts.

VPC

The name of the virtual private cloud (VPC) that contains your data store. The AWS Glue console lists all VPCs with Amazon S3 VPC endpoints in your account. Choose a network setup that allows you to connect to the development endpoint from your local machine.

Note

Set up or confirm that DNS hostnames and DNS resolution are both enabled in your VPC. For more information, see Setting Up DNS in Your VPC.

Subnet

The subnet within the VPC with Amazon S3 VPC endpoints. The AWS Glue console lists all subnets with a route table to an Amazon S3 endpoint in your VPC. The console automatically provides the Availability Zone required to connect your VPC.

Security groups

The security groups that are associated with your subnet with a self-referencing inbound rule. Choose a maximum of four security groups. AWS Glue requires one or more security groups with an inbound source rule that allows AWS Glue to communicate. The AWS Glue console lists all security groups that are granted inbound access to your VPC. AWS Glue associates these security groups with the elastic network interface that is attached to your VPC subnet.

For more information about how to set up your subnet, see Setting Up Your Environment for Development Endpoints.

Public key contents

The public key value that is used to create the development endpoint. Use an SSH key generator program to create the key. Save the corresponding private key to later connect to the development endpoint using SSH. This is not an Amazon EC2 key pair.

Creating a Notebook

The AWS Glue Create notebook server window requests the properties required to create a notebook server to use an Apache Zeppelin notebook. You provide the following properties.

CloudFormation stack name

The name of your notebook that is created in the AWS CloudFormation stack on the development endpoint. The name is prefixed with aws-glue-. This notebook runs on an Amazon EC2 instance. The Zeppelin HTTP server is started on port 443.

IAM role

A role with a trust relationship to Amazon EC2 that matches the Amazon EC2 instance profile exactly. Create the role in the IAM console, select Amazon EC2, and attach a policy for the notebook, such as AWSGlueServiceNotebookRoleDefault. For more information, see Step 5: Create an IAM Role for Notebooks.

For more information about instance profiles, see Using Instance Profiles.

EC2 key pair

The Amazon EC2 key that is used to access the notebook. You can create a key pair on the Amazon EC2 console (https://console.aws.amazon.com/ec2/). For more information, see Amazon EC2 Key Pairs.

Notebook S3 path

The location where the state of the notebook is stored. The Amazon S3 path to the Zeppelin notebook must follow the format: s3://bucket-name/username. Subfolders cannot be included in the path.

Notebook username

The user name that you use to access the Zeppelin notebook.

Notebook password

The password that you use to access the Zeppelin notebook.

Notebook server tags

The AWS CloudFormation stack is always tagged with a key aws-glue-dev-endpoint and the value of the name of the development endpoint. You can add more tags to the AWS CloudFormation stack.

The AWS Glue Development endpoints details window displays a section for each notebook created on the development endpoint. The following properties are shown.

EC instance

The name of Amazon EC2 instance that is created to host your notebook. This links to the Amazon EC2 console (https://console.aws.amazon.com/ec2/) where the instance is tagged with the key aws-glue-dev-endpoint and value of the name of the development endpoint.

SSH to EC2 server command

Type this command in a terminal window to connect to the Amazon EC2 instance that is running your notebook.

Notebook URL

Type this URL in a browser to connect to your notebook on a local port.

CloudFormation stack

The name of the AWS CloudFormation stack used to create the notebook server.

Follow these steps to connect and debug errors in your notebook:

  • Open a web browser and type the Notebook URL in the address bar. The notebook initial page opens. Use the notebook to test your AWS Glue ETL script. Check your setup by typing spark.version to return the version of Apache Spark.

  • When a notebook is run, Apache Zeppelin does not emit error messages on failure. To determine the failure, you have to look at log files in the log directory. To view notebook logs, type the SSH to EC2 server command from a terminal window. Navigate to the zeppelin/logs folder and find your log file.

  • When you're finished working with your notebook, close any open terminal windows.