Get Started with AWS Glue Interactive Sessions - Amazon SageMaker

Get Started with AWS Glue Interactive Sessions

In this guide, you learn how to initiate an AWS Glue interactive session in SageMaker Studio Classic, and manage your environment with Jupyter magics.

Permissions for AWS Glue Interactive Sessions in SageMaker Studio Classic

This section lists the required policies to run AWS Glue interactive sessions in Studio Classic and explains how to set them up. In particular, it details how to:

  • Attach the AwsGlueSessionUserRestrictedServiceRole managed policy to your SageMaker execution role.

  • Create an inline custom policy on your SageMaker execution role.

  • Modify the trust relationship of your SageMaker execution role.

To attach the AwsGlueSessionUserRestrictedServiceRole managed policy to your execution role
  1. Open the IAM console.

  2. Select Roles in the left-side panel.

  3. Find your Studio Classic execution role. Choose the role name to access the role summary page.

  4. Under the Permissions tab, select Attach policies from the Add Permissions dropdown menu.

  5. Select the checkbox next to the managed policy AwsGlueSessionUserRestrictedServiceRole.

  6. Choose Attach policies.

    The summary page shows your newly-added managed policies.

To create the inline custom policy on your execution role
  1. Select Create inline policy in the Add Permissions dropdown menu.

  2. Select the JSON tab.

  3. Copy and paste in the following policy.

    { "Version": "2012-10-17", "Statement": [ { "Sid": "unique_statement_id", "Effect": "Allow", "Action": [ "iam:GetRole", "iam:PassRole", "sts:GetCallerIdentity" ], "Resource": "*" } ] }
  4. Choose Review policy.

  5. Enter a Name and choose Create policy.

    The summary page shows your newly-added custom policy.

To modify the trust relationship of your execution role
  1. Select the Trust relationships tab.

  2. Chose Edit trust policy.

  3. Copy and paste in the following policy.

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "glue.amazonaws.com", "sagemaker.amazonaws.com" ] }, "Action": "sts:AssumeRole" } ] }
  4. Choose Update policy.

You can add additional roles and policies if you need to access other AWS resources. For a description of the additional roles and policies you can include, see Interactive sessions with IAM in the AWS Glue documentation.

Tag propagation

Tags are commonly used to track and allocate costs, control access to your session, isolate your resources, and more. To learn about adding metadata to your AWS resources using tagging, or for details on common use cases, see Additional information.

You can enable the automatic propagation of AWS tags to new AWS Glue interactive sessions created from within the Studio Classic UI. When an AWS Glue interactive session is created from SageMaker Studio Classic, any user-defined tags attached to the user profile or shared space are carried over to the new AWS Glue interactive session. Additionally, SageMaker Studio Classic automatically adds two AWS-generated internal tags ((sagemaker:user-profile-arn and sagemaker:domain-arn) or (sagemaker:shared-space-arn and sagemaker:domain-arn)) to new AWS Glue interactive sessions created from the Studio Classic UI. You can use these tags to aggregate costs across individual domains, user profiles, or spaces.

Enable tag propagation

To enable the automatic propagation of tags to new AWS Glue interactive sessions, set the following permissions for your SageMaker execution role and the IAM role associated with your AWS Glue session:

Note

By default, the role associated with the AWS Glue interactive session is the same as the SageMaker execution role. You can specify a different execution role for the AWS Glue interactive session by using the %iam_role magic command. For information on the available Jupyter magic commands to configure AWS Glue interactive sessions, see Configure your AWS Glue interactive session in SageMaker Studio Classic.

  • On your SageMaker execution role: Create a new inline policy, and paste the following JSON file. The policy grants the execution role permission to describe (DescribeUserProfile, DescribeSpace, DescribeDomain) and list the tags (ListTag) set on the user profiles, shared spaces, and SageMaker domain.

    { "Effect": "Allow", "Action": [ "sagemaker:ListTags" ], "Resource": [ "arn:aws:sagemaker:*:*:user-profile/*", "arn:aws:sagemaker:*:*:space/*" ] }, { "Effect": "Allow", "Action": [ "sagemaker:DescribeUserProfile" ], "Resource": [ "arn:aws:sagemaker:*:*:user-profile/*" ] }, { "Effect": "Allow", "Action": [ "sagemaker:DescribeSpace" ], "Resource": [ "arn:aws:sagemaker:*:*:space/*" ] } { "Effect": "Allow", "Action": [ "sagemaker:DescribeDomain" ], "Resource": [ "arn:aws:sagemaker:*:*:domain/*" ] }
  • On the IAM role of your AWS Glue session: Create a new inline policy, and paste the following JSON file. The policy grants your role permission to attach tags (TagResource) to your session, or retrieve its list of tags (GetTags).

    { "Effect": "Allow", "Action": [ "glue:TagResource", "glue:GetTags" ], "Resource": [ "arn:aws:glue:*:*:session/*" ] }
Note
  • Failures occurring while applying those permissions do not prevent the creation of AWS Glue interactive sessions. You can find details about the reason of the failure in SageMaker Studio Classic CloudWatch logs.

  • You must restart the kernel of your AWS Glue interactive session to propagate the update of a tag’s value.

It is important to note the following points:

  • Once a tag is attached to a session, it cannot be removed by propagation.

    You can remove tags from an AWS Glue interactive session directly through the AWS CLI, the AWS Glue API, or the https://console.aws.amazon.com/sagemaker/. For example, using the AWS CLI, you can remove a tag by providing the session's ARN and the tag keys you want to remove as follows:

    aws glue untag-resource \ --resource-arn arn:aws:glue:region:account-id:session:session-name \ --tags-to-remove tag-key1,tag-key2
  • SageMaker Studio Classic adds two AWS-generated internal tags ((sagemaker:user-profile-arn and sagemaker:domain-arn) or (sagemaker:shared-space-arn and sagemaker:domain-arn)) to new AWS Glue interactive sessions created from the Studio Classic UI. Those tags count against the limit of 50 tags set on all AWS resources. Both sagemaker:user-profile-arn and sagemaker:shared-space-arn contain the domain ID to which they belong.

  • Tags keys starting with aws:, AWS:, or any combination of upper and lowercase letters as a prefix for keys are not propagated and are reserved for AWS use.

Additional information

For more information on tagging, refer to the following resources.

  • To learn about adding metadata to your AWS resources with tagging, see Tagging AWS resources.

  • For information on tracking costs using tags, see Cost analysis in SageMaker Studio Classic Administration Best Practices.

  • For information on controlling access to AWS Glue based on tag keys, see ABAC with AWS Glue.

Launch your AWS Glue interactive session on SageMaker Studio Classic

After you create the roles, policies, and SageMaker domain, you can launch your AWS Glue interactive session in SageMaker Studio Classic.

To launch AWS Glue in SageMaker Studio Classic
  1. Create a SageMaker domain. For instructions on how to create a new domain, see Amazon SageMaker domain overview.

  2. Sign in to the SageMaker console at https://console.aws.amazon.com/sagemaker/.

  3. Select Control Panel in the left-side panel.

  4. In the Launch App dropdown menu next to the user name, select Studio.

  5. In the Jupyter view, choose File, then New, then Notebook.

  6. In the Image dropdown menu, select SparkAnalytics 1.0 or SparkAnalytics 2.0. In the kernel dropdown menu, select Glue Spark or Glue Python [PySpark and Ray]. Choose Select.

  7. (optional) Use Jupyter magics to customize your environment. For more information about Jupyter magics, see Configure your AWS Glue interactive session in SageMaker Studio Classic.

  8. Start writing your Spark data processing scripts.

Configure your AWS Glue interactive session in SageMaker Studio Classic

Note

All magic configurations are carried over to subsequent sessions for the lifetime of the AWS Glue kernel.

You can use Jupyter magics in your AWS Glue interactive session to modify your session and configuration parameters. Magics are short commands prefixed with % at the start of Jupyter cells that provide a quick and easy way to help you control your environment. In your AWS Glue interactive session, the following magics are set for you by default:

Magic Default value
%glue_version

3.0

%iam_role

execution role attached to your SageMaker domain

%region

your region

You can use magics to further customize your environment. For example, if you want to change the number of workers allocated to your job from the default five to 10, you can specify %number_of_workers 10. If you want to configure your session to stop after 10 minutes of idle time instead of the default 2880, you can specify %idle_timeout 10.

All of the Jupyter magics currently available in AWS Glue are also available in SageMaker Studio Classic. For the complete list of AWS Glue magics available, see Configuring AWS Glue interactive sessions for Jupyter and AWS Glue Studio Classic notebooks.