Getting started with AWS Glue Interactive Sessions - AWS Glue

Getting started with AWS Glue Interactive Sessions

These sections describe how to run AWS Glue interactive sessions locally.

Prerequisites for Setting Up Interactive Sessions

The following are prerequisites for installing interactive sessions:

  • Python 3.6 or later

  • See sections below for MacOS/Linux and Windows instructions.

MacOS/Linux Instructions

Creating and activating a Python Virtual Environment (Optional, Recommended)

We recommend that you create a new virtual environment for the preview. The preview installer overwrites the service files for AWS Glue in Boto3 and the AWS CLI with preview versions. Having a separate virtual environment helps prevent conflicts with production versions.

The following example creates a virtual environment in the directory interactive_sessions_demo.

In your terminal run the following:

mkdir interactive_sessions_demo cd interactive_sessions_demo python3 -m venv . source bin/activate

Installing Jupyter and AWS Glue Interactive Sessions Kernels

  1. Install jupyter boto3 and aws-glue-sessions with pip. Jupyter Lab is also compatible and can be installed instead.

    pip3 install --upgrade jupyter boto3 aws-glue-sessions
  2. Install the interactive sessions kernels into Jupyter by running the following commands. These commands will look up the installation location for aws-glue-sessions from pip and install the Jupyter kernels therein.

    jupyter kernelspec install $(pip3 show aws-glue-sessions | grep Location | awk '{print $2}')/aws_glue_interactive_sessions_kernel/glue_python_kernel
    jupyter kernelspec install $(pip3 show aws-glue-sessions | grep Location | awk '{print $2}')/aws_glue_interactive_sessions_kernel/glue_scala_kernel

Running Jupyter Notebook

To run Jupyter Notebook, complete the following steps.

  1. If using a virtual environment and it is not active, activate it with the following commands:

    cd interactive_sessions_demo python3 -m venv . source bin/activate
  2. Run the following command to launch Jupyter Notebook.

    jupyter notebook
  3. Choose New, and then choose one of the AWS Glue kernels to begin coding against AWS Glue.

Windows Instructions

Creating and activating a Python Virtual Environment (Recommended for Preview)

We recommend that you create a new virtual environment for the preview. The preview installer overwrites the service files for AWS Glue in Boto3 and the AWS CLI with preview versions. Having a separate virtual environment helps prevent conflicts with production versions.

The following example creates a virtual environment in the directory interactive_sessions_demo.

In your terminal (Bash / Zsh or Windows PowerShell), run the following. Windows PowerShell users might need to use python instead of python3.

macOS and Linux:

mkdir interactive_sessions_demo cd interactive_sessions_demo python3 -m venv . source bin/activate

Windows PowerShell:

mkdir interactive_sessions_demo cd interactive_sessions_demo python3 -m venv . .\Scripts\activate.ps1

Installing Jupyter and AWS Glue Interactive Sessions Kernels

  1. Use pip to install Jupyter. Jupyter Lab is also compatible and can be installed instead. (Instructions for Bash / Zsh and Windows PowerShell).

    pip3 install --upgrade jupyter
  2. Run the following to install Boto3:

    pip3 install --upgrade boto3
  3. Install AWS Glue interactive sessions from pip:

    pip3 install --upgrade aws-glue-sessions
  4. (Optional) Run the following command to list the installed packages. If jupyter and aws-glue-sessions were successfully installed, you should see a long list of packages, including jupyter 1.0.0 (or later).

    pip3 list
  5. Install the interactive sessions kernels into Jupyter by running the following commands. These commands will look up the installation location for aws-glue-sessions from pip and install the Jupyter kernels therein.

    1. Change the directory to the aws-glue-sessions install directory within python's site-packages directory.

      pip3 show aws-glue-sessions | Select-String Location | ConvertFrom-String.p2
    2. Windows PowerShell:

      cd <site-packages Location>\aws_glue_interactive_sessions_kernel\
    3. Install the AWS Glue PySpark and AWS Glue Scala kernels.

      jupyter-kernelspec install glue_python_kernel
      jupyter-kernelspec install glue_scala_kernel
    4. In Windows you must copy the service-2.json to your botocore directory in the site-packages. Use the following to copy it.

      cp C:\<site-packages Location>\lib\site-packages\aws_glue_interactive_sessions_kernel\service-2.json C:\<site-packages Location>\lib\site-packages\botocore\data\glue\2017-03-31\service-2.json

Running Jupyter Notebook

To run Jupyter Notebook, complete the following steps.

  1. If the virtual environment isn't running, start it with the following commands.

    macOS and Linux:

    cd interactive_sessions_demo python3 -m venv . source bin/activate

    Windows PowerShell:

    cd interactive_sessions_demo python3 -m venv . .\Scripts\activate.ps1
  2. In the virtual environment, run the following command to launch Jupyter Notebook.

    jupyter notebook
  3. Choose New, and then choose one of the AWS Glue kernels to begin coding against AWS Glue 2.0 or later.

Using Interactive Sessions with Microsoft Visual Studio Code

To run use Interactive Sessions with Microsoft Visual Studio, following the steps to install Visual Studio Code with Jupyter and create a new Jupyter Notebook:

  1. Download and install Visual Studio Code with Jupyter. For details, see Jupyter Notebooks in VS Code

  2. If using Virtual Environment, open Visual Studio Code in your Virtual Environment.

    1. In a terminal activate your Python Virtual environment created when you installed Interactive Sessions.

    2. Open a visual studio code by using code . from terminal or go to Visual Studio Code → File → Open Folder → select “interactive_sessions_demo”

      cd interactive_sessions_demo source bin/activate code .
  3. Create a new Jupyter Notebook.

    1. Save the file by going to File → New File → Save with the name of your choice. Save it as an .ipynb extension or select “jupyter” under “select a language” and save the file.

      
                  The GIF shows how to save the notebook with the .ipynb extension.
  4. In the left-hand navigation menu, double-click on the newly created file. The Jupyter shell will open and the notebook will open in the main viewing pane.

    
              The image shows an open Jupyter Notebook inside Micosoft Visual Studio Code.
  5. Choose Select Kernel. The list of available kernels is displayed. Choose the Glue PySpark or Glue Spark kernel (for Python and Scala respectively). For Python, choose Glue PySpark. By default, when new files are created no kernel is selected.

    
              The image shows the Select Kernel button highlighted.
    
              The image shows Glue PySpark highlighted in the list of available kernels.
    Note

    If you don’t see Glue PySpark and Glue Spark kernels in the dropdown list, please ensure you have installed the AWS Glue kernel or your python.pythonPath in setting VC code is correct. See following steps on how to validate the python.pythonPath.

    VS code should automatically populate the PythonPath with the correct interpreter when opened from a terminal with your virtual environment activated.

    Follow the steps to validate the path for python.pythonPath:

    1. Go to: Manage → Settings. This is also accessible by clicking the gear icon in the bottom left-hand corner of the Visual Studio Code application.

      
                  The image shows the settings page.
    2. Select the open settings icon in the upper-right hand corner. python.pythonPath should be pointing to your Virtual Python environment python location path. If you opened visual studio code from your virtual enviornment with code . this should be unnecessary.

    3. If you don’t see python.pythonPath please add it and restart the Visual Studio Code application.

      { "python.pythonPath":"Python path of Python Virual environment" }

      For example:

      { "python.pythonPath":"/Users/username/Documents/interactive_sessions_demo/lib/python3.8" }
      
                  The image shows the settings page with python.pythonPath added.
  6. Create an AWS Glue Interactive Session.

    Proceed to create a session in the same manner as you did in Juptyer Notebook. Specify any magics at the top of your first cell and run a statement of code.