External library and kernel installation
Important
Currently, all packages in notebook instance environments are licensed for use with Amazon SageMaker and do not require additional commercial licenses. However, this might be subject to change in the future, and we recommend reviewing the licensing terms regularly for any updates.
Amazon SageMaker notebook instances come with multiple environments already installed. These
environments contain Jupyter kernels and Python packages including: scikit, Pandas,
NumPy, TensorFlow, and MXNet. These environments, along with all files in the
sample-notebooks
folder, are refreshed when you stop and start a
notebook instance. You can also install your own environments that contain your choice
of packages and kernels.
The different Jupyter kernels in Amazon SageMaker notebook instances are separate conda
environments. For information about conda environments, see Managing
environments
Install custom environments and kernels on the notebook instance's Amazon EBS volume. This
ensures that they persist when you stop and restart the notebook instance, and that any
external libraries you install are not updated by SageMaker. To do that, use a lifecycle
configuration that includes both a script that runs when you create the notebook instance
(on-create)
and a script that runs each time you restart the notebook
instance (on-start
). For more information about using notebook instance
lifecycle configurations, see Customization of a SageMaker notebook instance
using an LCC script. There is a GitHub
repository that contains sample lifecycle configuration scripts at SageMaker
Notebook Instance Lifecycle Config Samples
The examples at https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/blob/master/scripts/persistent-conda-ebs/on-create.shon-create
script installs the ipykernel
library to create custom
environments
as Jupyter kernels, then uses pip install
and conda
install
to install libraries. You can adapt the script to create custom
environments and install libraries
that you want. SageMaker does not update these libraries when you stop and restart the
notebook instance, so you can ensure that your custom environment has specific
versions of libraries that you want. The on-start
script installs any
custom environments that you create as Jupyter kernels, so that they appear in the
dropdown list in the Jupyter New menu.
Package installation tools
SageMaker notebooks support the following package installation tools:
conda install
pip install
You can install packages using the following methods:
-
Lifecycle configuration scripts.
For example scripts, see SageMaker Notebook Instance Lifecycle Config Samples
. For more information on lifecycle configuration, see Customize a Notebook Instance Using a Lifecycle Configuration Script. -
Notebooks – The following commands are supported.
%conda install
%pip install
-
The Jupyter terminal – You can install packages using pip and conda directly.
From within a notebook you can use the system command syntax (lines starting with !) to
install packages, for example, !pip install
and !conda install
.
More recently, new commands have been added to IPython: %pip
and %conda
.
These commands are the recommended way to install packages from a notebook as they correctly
take into account the active environment or interpreter being used. For more information, see
Add %pip and %conda magic functions
Conda
Conda is an open source package management system and environment management system,
which can install packages and their dependencies. SageMaker supports using Conda with either
of the two main channels, the default channel, and the conda-forge channel. For more
information, see
Conda channels
Note
Due to how Conda resolves the dependency graph, installing packages from conda-forge can take significantly longer (in the worst cases, upwards of 10 minutes).
The Deep Learning AMI comes with many conda environments and many packages preinstalled. Due to the number of packages preinstalled, finding a set of packages that are guaranteed to be compatible is difficult. You may see a warning "The environment is inconsistent, please check the package plan carefully". Despite this warning, SageMaker ensures that all the SageMaker provided environments are correct. SageMaker cannot guarantee that any user installed packages will function correctly.
Note
Users of SageMaker, AWS Deep Learning AMIs and Amazon EMR can access the commercial Anaconda repository without taking a commercial license through February 1, 2024 when using Anaconda in those services. For any usage of the commercial Anaconda repository after February 1, 2024, customers are responsible for determining their own Anaconda license requirements.
Conda has two methods for activating environments: conda activate/deactivate, and source
activate/deactivate. For more information, see
Should
I use 'conda activate' or 'source activate' in Linux
SageMaker supports moving Conda environments onto the Amazon EBS volume, which is persisted when
the instance is stopped. The environments aren't persisted when the environments are installed
to the root volume, which is the default behavior. For an example lifecycle script, see
persistent-conda-ebs
Supported conda operations (see note at the bottom of this topic)
conda install of a package in a single environment
conda install of a package in all environments
conda install of a R package in the R environment
Installing a package from the main conda repository
Installing a package from conda-forge
-
Changing the Conda install location to use EBS
Supporting both conda activate and source activate
Pip
Pip is the de facto tool for installing and managing Python packages. Pip searches for packages on the Python Package Index (PyPI) by default. Unlike Conda, pip doesn't have built in environment support, and is not as thorough as Conda when it comes to packages with native/system library dependencies. Pip can be used to install packages in Conda environments.
You can use alternative package repositories with pip instead of the PyPI. For an example
lifecycle script, see
on-start.sh
Supported pip operations (see note at the bottom of this topic)
Using pip to install a package without an active conda environment (install packages system wide)
Using pip to install a package in a conda environment
Using pip to install a package in all conda environments
-
Changing the pip install location to use EBS
Using an alternative repository to install packages with pip
Unsupported
SageMaker aims to support as many package installation operations as possible. However, if the packages were installed by SageMaker or DLAMI, and you use the following operations on these packages, it might make your notebook instance unstable:
Uninstalling
Downgrading
Upgrading
We do not provide support for installing packages via yum install or installing R packages from CRAN.
Due to potential issues with network conditions or configurations, or the availability of Conda or PyPi, we cannot guarantee that packages will install in a fixed or deterministic amount of time.
Note
We cannot guarantee that a package installation will be successful. Attempting to install a package in an environment with incompatible dependencies can result in a failure. In such a case you should contact the library maintainer to see if it is possible to update the package dependencies. Alternatively you can attempt to modify the environment in such a way as to allow the installation. This modification however will likely mean removing or updating existing packages, which means we can no longer guarantee stability of this environment.