Configuring AWS Glue interactive sessions for Jupyter and AWS Glue Studio notebooks - AWS Glue

Configuring AWS Glue interactive sessions for Jupyter and AWS Glue Studio notebooks

Introduction to Jupyter Magics

Jupyter Magics are commands that can be run at the beginning of a cell or as a whole cell body. Magics start with % for line-magics and %% for cell-magics. Line-magics such as %region and %connections can be run with multiple magics in a cell, or with code included in the cell body like the following example.

%region us-east-2 %connections my_rds_connection dy_f = glue_context.create_dynamic_frame.from_catalog(database='rds_tables', table_name='sales_table')

Cell magics must use the entire cell and can have the command span multiple lines. An example of %%sql is below.

%%sql select * from rds_tables.sales_table

Magics supported by AWS Glue interactive sessions for Jupyter

The following are magics that you can use with AWS Glue interactive sessions for Jupyter notebooks.

Sessions magics

Name Type Description
%help n/a Return a list of descriptions and input types for all magic commands.
%profile String Specify a profile in your AWS configuration to use as the credentials provider.
%region String Specify the AWS Region; in which to initialize a session. Default from ~/.aws/configure.
%idle_timeout Int The number of minutes of inactivity after which a session will timeout after a cell has been executed. The default idle timeout value for Spark ETL sessions is the default timeout, 2880 minutes (48 hours). For other session types, consult documentation for that session type.
%session_id String Return the session ID for the running session. If a String is provided, this will be set as the session ID for the next running session. When you run a Jupyter Notebook in AWS Glue Studio, this magic returns a read-only value that you can't change.
%session_id_prefix String Define a string that will precede all session IDs in the format [session_id_prefix]-[session_id]. If a session ID is not provided, a random UUID will be generated. This magic is not supported when you run a Jupyter Notebook in AWS Glue Studio.
%status Return the status of the current AWS Glue session including its duration, configuration and executing user / role.
%stop_session Stop the current session.
%list_sessions Lists all currently running sessions by name and ID.
%glue_version String The version of Glue to be used by this session. Currently, the only valid options are 2.0 and 3.0. The default value is 2.0.
%streaming String Changes the session type to AWS Glue Streaming.
%etl String Changes the session type to AWS Glue ETL.

AWS Glue for Spark config magics

Name Type Description
%%configure Dictionary

Specify a JSON-formatted dictionary consisting of all configuration parameters for a session. Each parameter can be specified here or through individual magics.

Examples on how to use %%configure:

Max retries

%%configure { "max_retries": "0" }

Max concurrent runs

{ "max_concurrent_runs": "3" }

Spark UI

{ "--enable-spark-ui": "true", "--spark-event-logs-path": "s3://path/to/event/logs/" }
%iam_role String Specify an IAM role ARN to execute your session with. Default from ~/.aws/configure
%number_of_workers int The number of workers of a defined worker_type that are allocated when a job runs. worker_type must be set too. The default number_of_workers is 5.
%worker_type String Standard, G.1X, or G.2X. number_of_workers must be set too. The default worker_type is G.1X.
%security_config String Define a Security Configuration to be used with this session.
%connections List Specify a comma-separated list of connections to use in the session.
%additional_python_modules List Comma separated list of additional Python modules to include in your cluster (can be from PyPI or S3).
%extra_py_files List Comma separated list of additional Python files from Amazon S3.
%extra_jars List Comma-separated list of additional jars to include in the cluster.

Action magics

Name Type Description
%%sql String Run SQL code. All lines after the initial %%sql magic will be passed as part of the SQL code.

Naming sessions

AWS Glue interactive sessions are AWS resources and require a name. Names should be unique for each session and may be restricted by your IAM Administrators For more information, see Interactive sessions with IAM The Jupyter kernel automatically generates unique session names for you. However sessions can be named manually in two ways:

  1. Using the AWS Command Line Interface config file located at See Setting Up AWS Config with the AWS Command Line Interface.

  2. Using the %session_id_prefix magics. See Magics supported by AWS Glue interactive sessions for Jupyter.

A session name is generated as follows:

  • When the prefix and session_id are provided: the session name will be {prefix}-{UUID}.

  • When nothing is provided: the session name will be {UUID}.

Prefixing session names allows you to recognize your session when listing it in the AWS CLI or console.

Specifying an IAM role for interactive sessions

You must specify an AWS Identity and Access Management (IAM) role to use with AWS Glue ETL code that you run with interactive sessions.

The role requires the same IAM permissions as those required to run AWS Glue jobs. See Create an IAM role for AWS Glue for more information on creating a role for AWS Glue jobs and interactive sessions.

IAM roles can be specified in two ways:

Configuring sessions with named profiles

AWS Glue interactive sessions uses the same credentials as the AWS Command Line Interface or boto3, and interactive sessions honors and works with named profiles like the AWS CLI found in ~/.aws/config (Linux and MacOS) or %USERPROFILE%\.aws\config (Windows). For more information, see Using named profiles .

Interactive sessions takes advantage of named profiles by allowing the AWS Glue Service Role and Session ID Prefix to be specified in a profile. To configure a profile role, add a line for the iam_role key and/or session_id_prefix to your named profile as shown below. The session_id_prefix does not require quotes. For example, if you want to add a session_id_prefix, enter the value of the session_id_prefix=myprefix.

[default] region=us-east-1 aws_access_key_id=AKIAIOSFODNN7EXAMPLE aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY glue_iam_role=arn:aws:iam::<AccountID>:role/<GlueServiceRole> session_id_prefix=<prefix_for_session_names> [user1] region=eu-west-1 aws_access_key_id=AKIAI44QH8DHBEXAMPLE aws_secret_access_key=je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY glue_iam_role=arn:aws:iam::<AccountID>:role/<GlueServiceRoleUser1> session_id_prefix=<prefix_for_session_names_for_user1>

If you have a custom method of generating credentials, you can also configure your profile to use the credential_process parameter in your ~/.aws/config file. For example:

[profile developer] region=us-east-1 credential_process = "/Users/Dave/" --username helen

You can find more details about sourcing credentials through the credential_process parameter here: Sourcing credentials with an external process.

If a region or iam_role are not set in the profile that you are using, you must specify them using the %region and %iam_role magics in the first cell that you run.