Learn Workspace basics
When you use an EMR Studio, you can create and configure different Workspaces to organize and run notebooks. This section covers creating and working with Workspaces. For a conceptual overview, see Workspaces on the How Amazon EMR Studio works page.
This section covers the following topics to help you use EMR Studio Workspaces:
Create an EMR Studio Workspace
You can create EMR Studio Workspaces to run notebook code using the EMR Studio interface.
To create a Workspace in an EMR Studio
-
Log in to your EMR Studio.
-
Choose Create a Workspace.
-
Enter a Workspace name and a Description. Naming a Workspace helps you identify it on the Workspaces page.
-
If you want to work with other Studio users in this Workspace in real time, enable Workspace collaboration. You can configure collaborators after you launch the Workspace.
-
If you want to attach a cluster to a Workspace, expand the Advanced configuration section. You can attach a cluster later, if you prefer. For more information, see Attach a compute to an EMR Studio Workspace.
Note
To provision a new cluster, you need access permissions from your administrator.
Choose one of the cluster options for the Workspace and attach the cluster. For more information about provisioning a cluster when you create a Workspace, see Create and attach a new EMR cluster to an EMR Studio Workspace.
-
Choose Create a Workspace in the lower right of the page.
After you create a Workspace, EMR Studio will open the Workspaces page. You will see a green success banner at the top of the page and can find the newly-created Workspace in the list.
By default, a Workspace is shared and can be seen by all Studio users. However, only one user can open and work in a Workspace at a time. To work simultaneously with other users, you can Configure Workspace collaboration
Launch a Workspace
To start working with notebook files, launch a Workspace to access the notebook editor. The Workspaces page in a Studio lists all of the Workspaces that you have access to with details including Name, Status, Creation time, and Last modified.
Note
If you had EMR notebooks in the old Amazon EMR console, you can find them in the new console as EMR Studio Workspaces. EMR Notebooks users need additional IAM role permissions to access or create Workspaces. If you recently created a notebook in the old console, you might need to refresh the Workspaces list to see it in the new console. For more information about the transition, see Amazon EMR Notebooks are available as Amazon EMR Studio Workspaces in the new console and What's new with the console?
To launch a Workspace for editing and running notebooks
-
On the Workspaces page of your Studio, find the Workspace. You can filter the list by keyword or by column value.
-
Choose the Workspace name to launch the Workspace in a new browser tab. It may take a few minutes for the Workspace to open if it's Idle. Alternatively, select the row for the Workspace and then select Launch Workspace. You can choose from the following launch options:
-
Quick launch – Quickly launch your Workspace with default options. Choose Quick launch if you want to attach clusters to the Workspace in JupyterLab.
-
Launch with options – Launch your Workspace with custom options. You can choose to launch in either Jupyter or JupyterLab, attach your Workspace to an EMR cluster, and select your security groups.
Note
Only one user can open and work in a Workspace at a time. If you select a Workspace that is already in use, EMR Studio displays a notification when you try to open it. The User column on the Workspaces page shows the user working in the Workspace.
-
Understand the Workspace user interface
The EMR Studio Workspace user interface is based on the JupyterLab
interface
-
File Browser – Displays the files and directories in the Workspace, as well as the files and directories of linked Git repositories.
-
Running Kernels and Terminals – Lists all of the kernels and terminals running in the Workspace. For more information, see Managing kernels and terminals
in the official JupyterLab documentation. -
Git – Provides a graphical user interface for performing commands in the Git repositories attached to the Workspace. This panel is a JupyterLab extension called jupyterlab-git. For more information, see jupyterlab-git
. -
EMR clusters – Lets you attach a cluster to or detach a cluster from the Workspace to run notebook code. The EMR cluster configuration panel also provides advanced configuration options to help you create and attach a new cluster to the Workspace. For more information, see Create and attach a new EMR cluster to an EMR Studio Workspace.
-
Amazon EMR Git Repository – Helps you link the Workspace with up to three Git repositories. For details and instructions, see Link Git-based repositories to an EMR Studio Workspace.
-
Notebook Examples – Provides a list of notebook examples that you can save to the Workspace. You can also access the examples by choosing Notebook Examples on the Launcher page of the Workspace.
-
Commands – Offers a keyboard-driven way to search for and run JupyterLab commands. For more information, see the Command palette
page in the JupyterLab documentation. -
Notebook Tools – Lets you select and set options such as cell slide type and metadata. The Notebook Tools option appears in the left sidebar after you open a notebook file.
-
Open Tabs – Lists the open documents and activities in the main work area so that you can jump to an open tab. For more information, see the Tabs and single-document mode
page in the JupyterLab documentation. -
Collaboration – Lets you enable or disable Workspace collaboration, and manage collaborators. To see the Collaboration panel, you must have the necessary permissions. For more information, see Set ownership for Workspace collaboration.
Explore notebook examples
Every EMR Studio Workspace includes a set of notebook examples that you can use to explore EMR Studio features. To edit or run a notebook example, you can save it to the Workspace.
To save a notebook example to a Workspace
-
From the left sidebar, choose the Notebook Examples tab to open the Notebook Examples panel. You can also access the examples by choosing Notebook Examples on the Launcher page of the Workspace.
-
Choose a notebook example to preview it in the main work area. The example is read-only.
-
To save the notebook example to the Workspace, choose Save to Workspace. EMR Studio saves the example in your home directory. After you save a notebook example to the Workspace, you can rename, edit, and run it.
For more information about the notebook examples, see the EMR Studio Notebook
examples GitHub repository
Save Workspace content
When you work in the notebook editor of a Workspace, EMR Studio saves the content of notebook cells and output for you in the Amazon S3 location associated with the Studio. This backup process preserves work between sessions.
You can also save a notebook by pressing CTRL+S in the open notebook tab or by using one of the save options under File.
Another way to back up the notebook files in a Workspace is to associate the Workspace with a Git-based repository and sync your changes with the remote repository. Doing so also lets you save and share notebooks with team members who use a different Workspace or Studio. For instructions, see Link Git-based repositories to an EMR Studio Workspace.
Delete a Workspace and notebook files
When you delete a notebook file from an EMR Studio Workspace, you delete the file from the File browser, and EMR Studio removes its backup copy in Amazon S3. You do not have to take any further steps to avoid storage charges when you delete a file from a Workspace.
When you delete an entire Workspace, its notebook files and folders will remain in the Amazon S3 storage location. The files continue to accrue storage charges. To avoid storage charges, remove all backed-up files and folders that are associated with your deleted Workspace from Amazon S3.
To delete a notebook file from an EMR Studio Workspace
-
Select the File browser panel from the left sidebar in the Workspace.
-
Select the file or folder you want to delete. Right-click your selection and choose Delete. The file disappears from the list. EMR Studio removes the file or folder from Amazon S3 for you.
Understand Workspace status
After you create an EMR Studio Workspace, it appears as a row in the Workspaces list in your Studio with its name, status, creation time, and last modified timestamp. The following table describes Workspace statuses.
Status | Description |
---|---|
Starting | The Workspace is being prepared, but is not yet ready to use. You can't open a Workspace when its status is Starting. |
Ready | You can open the Workspace to use the notebook editor, but you must attach the Workspace to an EMR cluster before you can run notebook code. |
Attaching | The Workspace is being attached to a cluster. |
Attached | The Workspace is attached to an EMR cluster and ready for you to write and run notebook code. If a Workspace's status is not Attached, you must attach it to a cluster before you can run notebook code. |
Idle | The Workspace has stopped. To reactivate an idle Workspace, select it from the Workspaces list. The status changes from Idle to Starting to Ready when you select the Workspace. |
Stopping | The Workspace is shutting down and will be set to Idle. When you stop a Workspace, it terminates any corresponding notebook kernels. EMR Studio stops notebooks that have been inactive for a long time. |
Deleting | When you delete a Workspace, EMR Studio marks it for deletion and starts the deletion process. After the deletion process completes, the Workspace disappears from the list. When you delete a Workspace, its notebook files will remain in the Amazon S3 storage location. |
Resolve Workspace connectivity issues
To resolve Workspace connectivity issues, you can stop and restart a Workspace. When you restart a Workspace, EMR Studio launches the Workspace in a different Availability Zone or a different subnet that is associated with your Studio.
To stop and restart an EMR Studio Workspace
-
Close the Workspace in your browser.
-
Navigate to the Workspace list in the console.
-
Select your Workspace from the list and choose Actions.
-
Choose Stop and wait for the Workspace status to change from Stopping to Idle.
-
Choose Actions again, and then choose Start to restart the Workspace.
-
Wait for the Workspace status to change from Starting to Ready, then choose the Workspace name to reopen it in a new browser tab.