Troubleshooting Amazon SageMaker Studio
This topic describes how to troubleshoot common Amazon SageMaker Studio issues during setup and use. The following are common errors that might occur while using Amazon SageMaker Studio. Each error is followed by its solution.
Studio application issues
The following issues occur when launching and using the Studio application.
-
Screen not loading: Clearing workspace and waiting doesn't help
When launching the Studio application, a pop-up displays the following message. No matter which option is selected, Studio does not load.
Loading... The loading screen is taking a long time. Would you like to clear the workspace or keep waiting?
The Studio application can have a launch delay if multiple tabs are open in the Studio workspace or several files are on Amazon EFS. This pop-up should disappear in a few seconds after the Studio workspace is ready.
If you continue to see a loading screen with a spinner after selecting either of the options, there could be connectivity issues with the Amazon Virtual Private Cloud used by Studio.
To resolve connectivity issues with the Amazon Virtual Private Cloud (Amazon VPC) used by Studio, verify the following networking configurations:
-
If your domain is set up in
VpcOnly
mode: Verify that there is an Amazon VPC endpoint for AWS STS, or a NAT Gateway for outbound traffic, including traffic over the internet. To do this, follow the steps in Connect SageMaker Studio Notebooks in a VPC to External Resources. -
If your Amazon VPC is set up with a custom DNS instead of the DNS provided by Amazon: Verify that the routes are configured using Dynamic Host Configuration Protocol (DHCP) for each Amazon VPC endpoint added to the Amazon VPC used by Studio. For more information about setting default and custom DHCP option sets, see DHCP option sets in Amazon VPC.
-
-
Internal Failure when launching Studio
When launching Studio, you are unable to view the Studio UI. You also see an error similar to the following, with Internal Failure as the error detail.
Amazon SageMaker Studio The JupyterServer app default encountered a problem and was stopped.
This error can be caused by multiple factors. If completion of these steps does not resolve your issue, create an issue with https://aws.amazon.com/premiumsupport/.
Missing Amazon EFS mount target: Studio uses Amazon EFS for storage. The Amazon EFS volume needs a mount target for each subnet that the Amazon SageMaker domain is created in. If this Amazon EFS mount target is deleted accidentally, the Studio application cannot load because it cannot mount the user’s file directory. To resolve this issue, complete the following steps.
To verify or create mount targets.
-
Find the Amazon EFS volume that is associated with the domain by using the DescribeDomain API call.
-
Sign in to the AWS Management Console and open the Amazon EFS console at https://console.aws.amazon.com/efs/
. -
From the list of Amazon EFS volumes, select the Amazon EFS volume that is associated with the domain.
-
On the Amazon EFS details page, select the Network tab. Verify that there are mount targets for all of the subnets that the domain is set up in.
-
If mount targets are missing, add the missing Amazon EFS mount targets. For instructions, see Creating and managing mount targets and security groups.
-
After the missing mount targets are created, launch the Studio application.
-
-
Conflicting files in the user’s
.local
folder: If you're using JupyterLab version 1 on Studio, conflicting libraries in your.local
folder can cause issues when launching the Studio application. To resolve this, update your user profile's default JupyterLab version to JupyterLab 3.0. For more information about viewing and updating the JupyterLab version, see JupyterLab Versioning.
-
ConfigurationError: LifecycleConfig when launching Studio
You can't view the Studio UI when launching Studio. This is caused by issues with the default lifecycle configuration script attached to the domain.
To resolve lifecycle configuration issues
-
View the Amazon CloudWatch Logs for the lifecycle configuration to trace the command that caused the failure. To view the log, follow the steps in Verify lifecycle configuration process from CloudWatch Logs.
-
Detach the default script from the user profile or domain. For more information, see Update and detach lifecycle configurations.
-
Launch the Studio application.
-
Debug your lifecycle configuration script. You can run the lifecycle configuration script from the system terminal to troubleshoot. When the script runs successfully from the terminal, you can attach the script to the user profile or the domain.
-
-
SageMaker Studio core functionalities are not available.
If you get this error message when opening Studio, it may be due to Python package version conflicts. This occurs if you used the following commands in a notebook or terminal to install Python packages that have version conflicts with SageMaker package dependencies.
!pip install
pip install --user
To resolve this issue, complete the following steps:
-
Uninstall recently installed Python packages. If you’re not sure which package to uninstall, create an issue with https://aws.amazon.com/premiumsupport/.
-
Restart Studio:
-
Shut down Studio from the File menu.
-
Wait for one minute.
-
Reopen Studio by refreshing the page or opening it from the AWS Management Console.
-
The problem should be resolved if you have uninstalled the package which caused the conflict. To install packages without causing this issue again, use
%pip install
without the--user
flag.If the issue persists, create a new user profile and set up your environment with that user profile.
If these solutions don't fix the issue, create an issue with https://aws.amazon.com/premiumsupport/.
-
-
Unable to open Studio from the AWS Management Console.
If you are unable to open Studio and cannot make a new running instance with all default settings, create an issue with https://aws.amazon.com/premiumsupport/.
KernelGateway application issues
The following issues are specific to KernelGateway applications that are launched in Studio.
-
Cannot access the Kernel session
When the user launches a new notebook, they are unable to connect to the notebook session. If the KernelGateway application's status is
In Service
, you can verify the following to resolve the issue.-
Check Security Group configurations
If the domain is set up in
VPCOnly
mode, the security group associated with the domain must allow traffic between the ports in the range8192-65535
for connectivity between the JupyterServer and KernelGateway apps.To verify the security group rules
-
Get the security groups associated with the domain using the DescribeDomain API call.
-
Sign in to the AWS Management Console and open the Amazon VPC console at https://console.aws.amazon.com/vpc/
. -
From the left navigation, under Security, choose Security Groups.
-
Filter by the IDs of the security groups that are associated with the domain.
-
For each security group:
-
Select the security group.
-
From the security group details page, view the Inbound rules. Verify that traffic is allowed between ports in the range
8192-65535
.
-
For more information about security group rules, see Control traffic to resources using security groups. For more information about requirements to use Studio in
VPCOnly
mode, see Connect SageMaker Studio Notebooks in a VPC to External Resources. -
-
Verify firewall and WebSocket connections
If the KernelGateway apps have an
InService
status and the user is unable to connect to the Studio notebook session, verify the firewall and WebSocket settings.-
Launch the Studio application. For more information, see Launch Amazon SageMaker Studio.
-
Open your web browser’s developer tools.
-
Choose the Network tab.
-
Search for an entry that matches the following format.
wss://<domain-id>.studio.<region>.sagemaker.aws/jupyter/default/api/kernels/<unique-code>/channels?session_id=<unique-code>
If the status or response code for the entry is anything other than
101
, then your network settings are preventing the connection between the Studio application and the KernelGateway apps.To resolve this issue, contact the team that manages your networking settings to allow list the Studio URL and enable WebSocket connections.
-
-
-
Unable to launch an app caused by exceeded resource quotas
When a user tries to launch a new notebook, the notebook creation fails with either of the following errors. This is caused by exceeding resource quotas.
-
Unable to start more Apps of AppType [KernelGateway] and ResourceSpec(instanceType=[]) for UserProfile []. Please delete an App with a matching AppType and ResourceSpec, then try again
Studio supports up to four running KernelGateway apps on the same instance. To resolve this issue, you can do either of the following:
Delete an existing KernelGateway application running on the instance, then restart the new notebook.
Start the new notebook on a different instance type
For more information, see Change an Instance Type.
-
An error occurred (ResourceLimitExceeded) when calling the CreateApp operation
In this case, the account does not have sufficient limits to create a Studio application on the specified instance type. To resolve this, navigate to the Service Quotas console at https://console.aws.amazon.com/servicequotas/
. In that console, request to increase the Studio KernelGateway Apps running on
limit. For more information, see AWS service quotas.instance-type
instance
-