Amazon SageMaker
You can use Amazon SageMaker Notebooks to integrate your machine learning models with Amazon Timestream. To help you get started, we have created a sample SageMaker Notebook that processes data from Timestream. The data is inserted into Timestream from a multi-threaded Python application continuously sending data. The source code for the sample SageMaker Notebook and the sample Python application are available in GitHub.
-
Create a database and table following the instructions described in Create a database and Create a table
-
Clone the GitHub repository for the multi-threaded Python sample application
following the instructions from GitHub -
Clone the GitHub repository for the sample Timestream SageMaker Notebook
following the instructions from GitHub . -
Run the application for continuously ingesting data into Timestream following the instructions in the README
-
Follow the instructions to create an Amazon S3 bucket for Amazon SageMaker as described here.
-
Create an Amazon SageMaker instance with latest boto3 installed: In addition to the instructions described here, follow the steps below:
-
On the Create notebook instance page, click on Additional Configuration
-
Click on Lifecycle configuration - optional and select Create a new lifecycle configuration
-
On the Create lifecycle configuration wizard box, do the following:
-
Fill in a desired name to the configuration, e.g.
on-start
-
In Start Notebook script, copy-paste the script content from Github
-
Replace
PACKAGE=scipy
withPACKAGE=boto3
in the pasted script.
-
-
-
Click on Create configuration
-
Go to the IAM service in the AWS Management Console and find the newly created SageMaker execution role for the notebook instance.
-
Attach the IAM policy for
AmazonTimestreamFullAccess
to the execution role.Note
The
AmazonTimestreamFullAccess
IAM policy is not restricted to specific resources and is unsuitable for production use. For a production system, consider using policies that restrict access to specific resources. -
When the status of the notebook instance is InService, choose Open Jupyter to launch a SageMaker Notebook for the instance
-
Upload the files
timestreamquery.py
andTimestream_SageMaker_Demo.ipynb
into the Notebook by selecting the Upload button -
Choose
Timestream_SageMaker_Demo.ipynb
Note
If you see a pop up with Kernel not found, choose conda_python3 and click Set Kernel.
-
Modify
DB_NAME
,TABLE_NAME
,bucket
, andENDPOINT
to match the database name, table name, S3 bucket name, and region for the training models. -
Choose the play icon to run the individual cells
-
When you get to the cell
Leverage Timestream to find hosts with average CPU utilization across the fleet
, ensure that the output returns at least 2 host names.Note
If there are less than 2 host names in the output, you may need to rerun the sample Python application ingesting data into Timestream with a larger number of threads and host-scale.
-
When you get to the cell
Train a Random Cut Forest (RCF) model using the CPU utilization history
, change thetrain_instance_type
based on the resource requirements for your training job -
When you get to the cell
Deploy the model for inference
, change theinstance_type
based on the resource requirements for your inference jobNote
It may take a few minutes to train the model. When the training is complete, you will see the message Completed - Training job completed in the output of the cell.
-
Run the cell
Stop and delete the endpoint
to clean up resources. You can also stop and delete the instance from the SageMaker console