Amazon SageMaker
Developer Guide

Associate Git Repositories with Amazon SageMaker Notebook Instances

Associate Git repositories with your notebook instance to save your notebooks in a source control environment that persists even if you stop or delete your notebook instance. You can associate one default repository and up to three additional repositories with a notebook instance. The repositories can be hosted in AWS CodeCommit, GitHub or on any other Git server. Associating Git repositories with your notebook instance can be useful for:

  • Persistence - Notebooks in a notebook instance are stored on durable Amazon EBS volumes, but they do not persist beyond the life of your notebook instance. Storing notebooks in a Git repository enables you to store and use notebooks even if you stop or delete your notebook instance.

  • Collaboration - Peers on a team often work on machine learning projects together. Storing your notebooks in Git repositories allows peers working in different notebook instances to share notebooks and collaborate on them in a source-control environment.

  • Learning - Many Jupyter notebooks that demonstrate machine learning techniques are available in publicly hosted Git repositories, such as on GitHub. You can associate your notebook instance with a repository to easily load Jupyter notebooks contained in that repository.

There are two ways to associate a Git repository with a notebook instance:

  • Add a Git repository as a resource in your Amazon SageMaker account. Then, to access the repository, you can specify an AWS Secrets Manager secret that contains credentials. That way, you can access repositories that require authentication.

  • Associate a public Git repository that is not a resource in your account. If you do this, you cannot specify credentials to access the repository.

Add a Git Repository to Your Amazon SageMaker Account

To manage your GitHub repositories, easily associate them with your notebook instances, and associate credentials for repositories that require authentication, add the repositories as resources in your Amazon SageMaker account. You can view a list of repositories that are stored in your account and details about each repository in the Amazon SageMaker console and by using the API.

You can add Git repositories to your Amazon SageMaker account in the Amazon SageMaker console or by using the AWS CLI.

Note

You can use the Amazon SageMaker API CreateCodeRepository to add Git repositories to your Amazon SageMaker account, but step-by-step instructions are not provided here.

Add a Git Repository to Your Amazon SageMaker Account (Console)

To add a Git repository as a resource in your Amazon SageMaker account

  1. Open the Amazon SageMaker console at https://console.aws.amazon.com/sagemaker/.

  2. Choose Git repositories, then choose Add repository.

  3. To add an CodeCommit repository, choose AWS CodeCommit.

    1. To use an existing CodeCommit repository:

      1. Choose Use existing repository.

      2. For Repository, choose a repository from the list.

      3. Enter a name to use for the repository in Amazon SageMaker. The name must be 1 to 63 characters. Valid characters are a-z, A-Z, 0-9, and - (hyphen).

      4. Choose Add repository.

    2. To create a new CodeCommit repository:

      1. Choose Create new repository.

      2. Enter a name for the repository that you can use in both CodeCommit and Amazon SageMaker. The name must be 1 to 63 characters. Valid characters are a-z, A-Z, 0-9, and - (hyphen).

      3. Choose Create repository.

  4. To add a Git repository hosted somewhere other than CodeCommit :

    1. Choose GitHub/Other Git-based repo.

    2. Enter a name to use for the repository in Amazon SageMaker. The name must be 1 to 63 characters. Valid characters are a-z, A-Z, 0-9, and - (hyphen).

    3. Enter the URL for the repository.

    4. For Git credentials, choose the credentials to use to authenticate to the repository. This is necessary only if the Git repository is private.

      Note

      If you have two-factor authentication enabled for your Git repository, use a personal access token generated by your Git service provider instead of a password.

      1. To use an existing AWS Secrets Manager secret, choose Use existing secret, and then choose a secret from the list. For information about creating and storing a secret, see Creating a Basic Secret in the AWS Secrets Manager User Guide.

        Note

        The secret must have a staging label of AWSCURRENT and must be in the following format:

        {"username": UserName, "password": Password}

        For GitHub repositories, we recommend using a personal access token instead of your account password. For information, see https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/.

      2. To create a new AWS Secrets Manager secret, choose Create secret, enter a name for the secret, and then enter the username and password to use to authenticate to the repository.

        Note

        The IAM role you use to create the secret must have the secretsmanager:GetSecretValue permission in its IAM policy.

        The secret must have a staging label of AWSCURRENT and must be in the following format:

        {"username": UserName, "password": Password}

        For GitHub repositories, we recommend using a personal access token instead of your account password.

      3. To not use any credentials, choose No secret.

    5. Choose Create secret.

Add a Git Repository to Your Amazon SageMaker Account (CLI)

Use the create-code-repository AWS CLI command. Specify a name for the repository as the value of the code-repository-name argument. The name must be 1 to 63 characters. Valid characters are a-z, A-Z, 0-9, and - (hyphen). Also specify the following:

  • The default branch

  • The URL of the Git repository

  • The Amazon Resource Name (ARN) of an AWS Secrets Manager secret that contains the credentials to use to authenticate the repository as the value of the git-config argument

For information about creating and storing a secret, see Creating a Basic Secret in the AWS Secrets Manager User Guide. The following command creates a new repository named MyRespository in your Amazon SageMaker account that points to a Git repository hosted at https://github.com/myprofile/my-repo".

aws sagemaker create-code-repository \ --code-repository-name "MyRepository" \ --git-config '{"Branch":"master", \ "RepositoryUrl" : "https://github.com/myprofile/my-repo", \ "SecretArn" : "arn:aws:secretsmanager:us-east-2:012345678901:secret:my-secret-ABc0DE"}'

Note

The secret must have a staging label of AWSCURRENT and must be in the following format:

{"username": UserName, "password": Password}

For GitHub repositories, we recommend using a personal access token instead of your account password.

Create a Notebook Instance with an Associated Git Repository

You can associate Git repositories with a notebook instance when you create the notebook instance by using the AWS Management Console, or the AWS CLI.

Note

You can use the Amazon SageMaker API CreateNotebookInstance to associate Git repositories with a notebook instance, but step-by-step instructions are not provided here.

Note

If you want to use a CodeCommit repository that is in a different AWS than the notebook instance,set up cross-account access for the repository. For information, see Associate a CodeCommit Repository in a Different AWS Account with a Notebook Instance.

Create a Notebook Instance with an Associated Git Repository (Console)

To create a notebook instance and associate Git repositories in the AWS Management Console

  1. Follow the instructions at Step 1: Create an Amazon SageMaker Notebook Instance.

  2. For Git repositories, choose Git repositories to associate with the notebook instance.

    1. For Default repository, choose a repository that you want to use as your default repository. Amazon SageMaker clones this repository as a subdirectory in the Jupyter startup directory at /home/ec2-user/SageMaker. When you open your notebook instance, it opens in this repository. To choose a repository that is stored as a resource in your account, choose its name from the list. To add a new repository as a resource in your account, choose Add a repository to Amazon SageMaker (opens the Add repository flow in a new window) and then follow the instructions at Create a Notebook Instance with an Associated Git Repository (Console). To clone a public repository that is not stored in your account, choose Clone a public Git repository to this notebook instance only, and then specify the URL for that repository.

    2. For Additional repository 1, choose a repository that you want to add as an additional directory. Amazon SageMaker clones this repository as a subdirectory in the Jupyter startup directory at /home/ec2-user/SageMaker. To choose a repository that is stored as a resource in your account, choose its name from the list. To add a new repository as a resource in your account, choose Add a repository to Amazon SageMaker (opens the Add repository flow in a new window) and then follow the instructions at Create a Notebook Instance with an Associated Git Repository (Console). To clone a repository that is not stored in your account, choose Clone a public Git repository to this notebook instance only, and then specify the URL for that repository.

      Repeat this step up to three times to add up to three additional repositories to your notebook instance.

Create a Notebook Instance with an Associated Git Repository (CLI)

To create a notebook instance and associate Git repositories by using the AWS CLI, use the create-notebook-instance command as follows:

  • Specify the repository that you want to use as your default repository as the value of the default-code-repository argument. Amazon SageMaker clones this repository as a subdirectory in the Jupyter startup directory at /home/ec2-user/SageMaker. When you open your notebook instance, it opens in this repository. To use a repository that is stored as a resource in your Amazon SageMaker account, specify the name of the repository as the value of the default-code-repository argument. To use a repository that is not stored in your account, specify the URL of the repository as the value of the default-code-repository argument.

  • Specify up to three additional repositories as the value of the additional-code-repositories argument. Amazon SageMaker clones this repository as a subdirectory in the Jupyter startup directory at /home/ec2-user/SageMaker, and the repository is excluded from the default repository by adding it to the .git/info/exclude directory of the default repository. To use repositories that are stored as resources in your Amazon SageMaker account, specify the names of the repositories as the value of the additional-code-repositories argument. To use repositories that are not stored in your account, specify the URLs of the repositories as the value of the additional-code-repositories argument.

For example, the following command creates a notebook instance that has a repository named MyGitRepo, that is stored as a resource in your Amazon SageMaker account, as a default repository, and an additional repository that is hosted on GitHub:

aws sagemaker create-notebook-instance \ --notebook-instance-name "MyNotebookInstance" \ --instance-type "ml.t2.medium" \ --role-arn "arn:aws:iam::012345678901:role/service-role/AmazonSageMaker-ExecutionRole-20181129T121390" \ --default-code-repository "MyGitRepo" \ --additional-code-repositories "https://github.com/myprofile/my-other-repo"

Note

If you use an AWS CodeCommit repository that does not contain "SageMaker" in its name, add the codecommit:GitPull and codecommit:GitPush permissions to the role that you pass as the role-arn argument to the create-notebook-instance command. For information about how to add permissions to a role, see Adding and Removing IAM Policies in the AWS Identity and Access Management User Guide.

Associate a CodeCommit Repository in a Different AWS Account with a Notebook Instance

To associate a CodeCommit repository in a different AWS account with your notebook instance, set up cross-account access for the CodeCommit repository.

To set up cross-account access for a CodeCommit repository and associate it with a notebook instance:

  1. In the AWS account that contains the CodeCommit repository, create an IAM policy that allows access to the repository from users in the account that contains your notebook instance. For information, see Step 1: Create a Policy for Repository Access in AccountA in the CodeCommit User Guide.

  2. In the AWS account that contains the CodeCommit repository, create an IAM role, and attach the policy that you created in the previous step to that role. For information, see Step 2: Create a Role for Repository Access in AccountA in the CodeCommit User Guide.

  3. Create a profile in the notebook instance that uses the role that you created in the previous step:

    1. Open the notebook instance.

    2. Open a terminal in the notebook instance.

    3. Edit a new profile by typing the following in the terminal:

      vi /home/ec2-user/.aws/config
    4. Edit the file with the following profile information:

      [profile CrossAccountAccessProfile] region = us-west-2 role_arn = arn:aws:iam::CodeCommitAccount:role/CrossAccountRepositoryContributorRole credential_source=Ec2InstanceMetadata output = json

      Where CodeCommitAccount is the account that contains the CodeCommit repository, CrossAccountAccessProfile is the name of the new profile, and CrossAccountRepositoryContributorRole is the name of the role you created in the previous step.

  4. On the notebook instance, configure git to use the profile you created in the previous step:

    1. Open the notebook instance.

    2. Open a terminal in the notebook instance.

    3. Edit the Git configuration file typing the following in the terminal:

      vi /home/ec2-user/.gitconfig
    4. Edit the file with the following profile information:

      [credential] helper = !aws codecommit credential-helper --profile CrossAccountAccessProfile $@ UseHttpPath = true

      Where CrossAccountAccessProfile is the name of the profile that you created in the previous step.

Use Git Repositories in a Notebook Instance

When you open a notebook instance that has Git repositories associated with it, it opens in the default repository, which is installed in your notebook instance directly under /home/ec2-user/SageMaker. You can open and create notebooks, and you can manually run Git commands in a notebook cell. For example:

!git pull origin master

To open any of the additional repositories, navigate up one folder. The additional repositories are also installed as directories under /home/ec2-user/SageMaker.

If you open the notebook instance with a JupyterLab interface, the jupyter-git extension is installed and available to use. For information about the jupyter-git extension for JupyterLab, see https://github.com/jupyterlab/jupyterlab-git.

When you open a notebook instance in JupyterLab, you see the git repositories associated with it on the left menu:

You can use the jupyter-git extension to manage git visually, instead of using the command line: