R User Guide to Amazon SageMaker - Amazon SageMaker

R User Guide to Amazon SageMaker

This document will walk you through ways of leveraging Amazon SageMaker features using R. This guide introduces SageMaker's built-in R kernel, how to get started with R on SageMaker, and finally several example notebooks.

The examples are organized in three levels, Beginner, Intermediate, and Advanced. They start from Getting Started with R on SageMaker, continue to end-to-end machine learning with R on SageMaker, and then finish with more advanced topics such as SageMaker Processing with R script, and Bring-Your-Own (BYO) R algorithm to SageMaker. 

For information on how to bring your own custom R image to Studio, see Bring your own SageMaker image. For a similar blog article, see Bringing your own R environment to Amazon SageMaker Studio.

RStudio Support in SageMaker

Amazon SageMaker supports RStudio as a fully-managed integrated development environment (IDE) integrated with Amazon SageMaker domain. With RStudio integration, you can launch an RStudio environment in the domain to run your RStudio workflows on SageMaker resources. For more information, see RStudio on Amazon SageMaker.

R Kernel in SageMaker

SageMaker notebook instances support R using a pre-installed R kernel. Also, the R kernel has the reticulate library, an R to Python interface, so you can use the features of SageMaker Python SDK from within an R script.

Get Started with R in SageMaker

  •  Create a Notebook Instance using the t2.medium instance type and default storage size. You can pick a faster instance and more storage if you plan to continue using the instance for more advanced examples, or create a bigger instance later.

  • Wait until the status of the notebook is In Service, and then click Open Jupyter.

  • Create a new notebook with R kernel from the list of available environments. 

  • When the new notebook is created, you should see an R logo in the upper right corner of the notebook environment, and also R as the kernel under that logo. This indicates that SageMaker has successfully launched the R kernel for this notebook.

  • Alternatively, when you are in a Jupyter notebook, you can use Kernel menu, and then select R from Change Kernel option.

Example Notebooks

Prerequisites

Getting Started with R on SageMaker: This sample notebook describes how you can develop R scripts using Amazon SageMaker‘s R kernel. In this notebook you set up your SageMaker environment and permissions, download the abalone dataset from the UCI Machine Learning Repository, do some basic processing and visualization on the data, then save the data as .csv format to S3.

Beginner Level

SageMaker Batch Transform using R Kernel: This sample Notebook describes how to conduct a batch transform job using SageMaker’s Transformer API and the XGBoost algorithm. The notebook also uses the Abalone dataset.

Intermediate Level

Hyperparameter Optimization for XGBoost in R: This sample notebook extends the previous beginner notebooks that use the abalone dataset and XGBoost. It describes how to do model tuning with hyperparameter optimization. You will also learn how to use batch transform for batching predictions, as well as how to create a model endpoint to make real-time predictions. 

Amazon SageMaker Processing with R: SageMaker Processing lets you preprocess, post-process and run model evaluation workloads. This example shows you how to create an R script to orchestrate a Processing job. 

Advanced Level

Train and Deploy Your Own R Algorithm in SageMaker: Do you already have an R algorithm, and you want to bring it into SageMaker to tune, train, or deploy it? This example walks you through how to customize SageMaker containers with custom R packages, all the way to using a hosted endpoint for inference on your R-origin model.