Adding a development endpoint
Use development endpoints to iteratively develop and test your extract, transform, and load (ETL) scripts in AWS Glue. You can add a development endpoint using the AWS Glue console or the AWS Command Line Interface (AWS CLI).
To add a development endpoint (console)
-
Open the AWS Glue console at https://console.aws.amazon.com/glue/
. Sign in as a user who has the IAM permission glue:CreateDevEndpoint
. -
In the navigation pane, choose Dev endpoints, and then choose Add endpoint.
-
Follow the steps in the AWS Glue Add endpoint wizard to provide the properties that are required to create an endpoint. Specify an IAM role that permits access to your data.
If you choose to provide an SSH public key when you create your development endpoint, save the SSH private key to access the development endpoint later.
-
Choose Finish to complete the wizard. Then check the console for development endpoint status. When the status changes to
READY
, the development endpoint is ready to use.When creating the endpoint, you can provide the following optional information:
- Security configuration
-
To specify at-rest encryption options, add a security configuration to the development endpoint.
- Worker type
-
The type of predefined worker that is allocated to the development endpoint. Accepts a value of
Standard
,G.1X
, orG.2X
.-
For the
Standard
worker type, each worker provides 4 vCPU, 16 GB of memory, a 50 GB disk, and 2 executors per worker. -
For the
G.1X
worker type, each worker maps to 1 DPU (4 vCPU, 16 GB of memory, and a 64 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs. -
For the
G.2X
worker type, each worker maps to 2 DPU (8 vCPU, 32 GB of memory, and a 128 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs.
-
- Number of workers
-
The number of workers of a defined
workerType
that are allocated to the development endpoint. This field is available only when you choose worker type G.1X or G.2X. - Data processing units (DPUs)
-
The number of DPUs that AWS Glue uses for your development endpoint. The number must be greater than 1.
- Python library path
-
Comma-separated Amazon Simple Storage Service (Amazon S3) paths to Python libraries that are required by your script. Multiple values must be complete paths separated by a comma (
,
). Only individual files are supported, not a directory path.Note You can use only pure Python libraries. Libraries that rely on C extensions, such as the Pandas Python data analysis library, are not yet supported.
- Dependent jars path
-
Comma-separated Amazon S3 paths to JAR files that are required by the script.
Note Currently, you can use only pure Java or Scala (2.11) libraries.
- AWS Glue version
-
Specifies the versions of Python and Apache Spark to use. Defaults to AWS Glue version 1.0 (Python version 3 and Spark version 2.4). For more information, see the Glue version job property.
- Tags
-
Tag your development endpoint with a Tag key and optional Tag value. After tag keys are created, they are read-only. Use tags on some resources to help you organize and identify them. For more information, see AWS tags in AWS Glue.
- Spark UI
-
Turns on the use of Spark UI for monitoring Spark applications running on this development endpoint. For more information, see Enabling the Apache Spark web UI for development endpoints.
- Use AWS Glue Data Catalog as the Hive metastore (under Catalog Options)
-
Allows you to use the AWS Glue Data Catalog as a Spark Hive metastore.
To add a development endpoint (AWS CLI)
-
In a command line window, enter a command similar to the following.
aws glue create-dev-endpoint --endpoint-name "endpoint1" --role-arn "arn:aws:iam::
account-id
:role/role-name
" --number-of-nodes "3" --glue-version "1.0" --arguments '{"GLUE_PYTHON_VERSION": "3"}' --region "region-name
"This command specifies AWS Glue version 1.0. Because this version supports both Python 2 and Python 3, you can use the
arguments
parameter to indicate the desired Python version. If theglue-version
parameter is omitted, AWS Glue version 0.9 is assumed. For more information about AWS Glue versions, see the Glue version job property.For information about additional command line parameters, see create-dev-endpoint in the AWS CLI Command Reference.
-
(Optional) Enter the following command to check the development endpoint status. When the status changes to
READY
, the development endpoint is ready to use.aws glue get-dev-endpoint --endpoint-name "endpoint1"