Work with assets (user guide)
Use SageMaker Assets to seamlessly collaborate on machine learning projects with other individuals in your organization. With SageMaker Assets, you and your collaborators create and share models and data tables with each other. Within SageMaker Assets, these models and data tables are known as assets.
SageMaker Assets is a feature within Amazon SageMaker Studio. You or your administrator create a Studio environment within an Amazon DataZone project. For more information about setting up Amazon DataZone, see Set up SageMaker Assets (administrator guide).
Assets are ML assets or data assets. ML assets are metadata that point to the following:
-
Feature Store feature groups
-
SageMaker model groups
The underlying model groups and feature groups are the sources of data. If you update a feature group or model group, the asset for the model group or feature group gets updated within the day.
Data assets are metadata that point to the following:
-
Amazon Redshift tables
-
AWS Glue tables
For data assets, the data source is the mechanism that pulls metadata from the AWS Glue tables and Amazon Redshift tables into the asset. For example, a data source pulls the metadata from an AWS Glue table into the asset for that table.
You can make an asset visible to everyone in your organization by publishing it. Individuals can review the metadata in the asset and request access. If you provide access, they get access to the underlying machine learning source of data or table.
Your administrator has likely given you access to the feature groups, model groups, and tables. If they haven't, see the information in Set up SageMaker Assets (administrator guide) to help you get started.
The following sections provide reference information for feature groups and model groups.
Amazon SageMaker Feature Store provides a centralized location to help you store and manage your features. It's a highly performant repository that you can use for feature engineering.
Within Feature Store, features are stored in a feature group. A feature group is a collection of features related to a project that you're working on. For example, if you're working on a project related to predicting housing prices, a feature group might include features such as location or number of bedrooms.
For more information about how you can use feature groups to streamline the process of feature engineering, see Create, store, and share features with Feature Store.
You can use SageMaker model groups within SageMaker Model Registry to organize and manage different versions of your models. You can compare the different versions of the models to see which one performs best for your use case. For more information about SageMaker Model Registry, see Model Registration Deployment with Model Registry.
The following is background information on Amazon Redshift and AWS Glue.
Amazon Redshift is a large scale data warehousing service that provides fast query performance on large datasets. For more information about Amazon Redshift, see Amazon Redshift Serverless.
AWS Glue is an extract, transform, load (ETL) service that you can use to simplify the process of data preparation. For more information about AWS Glue, see What is AWS Glue?
You can use the SQL editor to connect AWS Glue and Amazon Redshift databases and run queries. You can share any tables that you create in the editor within SageMaker Assets. For more information, see Data preparation with SQL in Studio.
Topics
Terminology and Concepts
Before you get started with using SageMaker Assets, it's helpful to familiarize yourself with the following terminology and concepts:
-
Asset – The metadata that points to the models or data tables that you're sharing. You either request access to an asset that someone else owns or share your asset with others. You and your teammates access the asset and the underlying data table or model associated with it.
-
Subscribed assets – To request access to an asset, you submit a subscription request. If your request is approved, the asset appears under your subscribed assets.
-
Owned assets – The assets that you've shared with your teammates.
-
Asset catalog – The assets that you've shared across your organization.
Step 1: Access SageMaker Assets
Access SageMaker Assets to view your assets and share them with others. Use the following information to help you get started with using it.
You access SageMaker Assets from a project within an Amazon DataZone domain. A project is a collaboration between you and your team members. Within the project, you and the other members of your project have access to the assets that you and your other team members create within the inventory catalog. You can publish the assets to the published catalog to make them visible to other individuals in your organization.
Those individuals can request access to your asset. If you provide them with access, they can get access to the updated source of data. For example, if an individual subscribes to an AWS Glue table that you update, they can access the updated AWS Glue table in real time.
Use the following procedure to access SageMaker Assets.
To access SageMaker Assets
-
Open the Amazon DataZone
console. -
Choose View domains.
-
Next to the domain containing your project, choose Open data portal.
-
Under Analytics Tools, choose SageMaker Studio.
-
Choose Open Amazon SageMaker.
-
Choose Assets.
The assets that have been shared with you are under Subscribed assets. The assets that you and your project members create are under Owned assets. The assets that you and the other members of your organization have published are in the Assets catalog.
Step 2: Share assets and manage access to them
After you create machine learning models, feature groups, or data tables, you can make them visible to the individuals collaborating with you on your project or your organization more broadly. You can respond to requests for access to the asset. If you approve an individual's request, they can modify the asset's underlying source of data.
When you're sharing an asset, you have two options:
-
Publish to asset catalog – Make the asset visible to everyone in your organization
-
Publish to inventory – Make the asset visible to everyone working on your project
If you've published your asset to the asset catalog, individuals in your organization can find it in the assets catalog. They can view your asset's metadata and decide if they want to request access to them. If you approve their request, they get access to the underlying source of data.
If you publish to inventory, you and the other members of your project can access the asset without any additional action.
Assets published to the inventory only appear under Owned assets. Assets published to the catalog appear under Owned assets and Assets catalog.
When you publish a data table, you must create a data source that pulls the metadata from the underlying AWS Glue table or Amazon Redshift table into the asset. Use the following procedures to publish a AWS Glue or Amazon Redshift table.
Use the following procedures to publish an asset for a feature group or model package group.
Use the following procedure to publish an asset from your owned assets to the asset catalog.
To publish an asset from the SageMaker Assets page
-
Within Studio, navigate to Assets.
-
Select Owned assets.
-
Specify the name of your asset in the search bar.
-
Choose the asset.
-
Choose Publish.
You can use the following SageMaker Python SDK code to publish a feature group or model package group. The code assumes that you've already created the feature group or model package group.
from sagemaker.asset import AssetManager publisher = AssetPublisher() publisher.publish_to_catalog(
name-of-your-feature-group-or-model-package
)
Step 3: Manage access requests
After you've published an asset, users outside of your project might want to access it. You can provide, reject, or revoke access requests. You can also delete assets to only make the underlying source of data only available to yourself.
Use the following procedure to respond to subscription requests.
To approve subscription requests
-
Navigate to the SageMaker Assets page.
-
Choose Manage asset assets.
-
Select Incoming subscription requests.
-
-
(Optional) Choose Approve and provide reason.
-
(Optional) Choose Reject.
-
You can revoke access to an asset that you've previously approved. If you choose to revoke access, users lose access to both the asset and the underlying asset. source. Use the following procedure to revoke access.
To revoke access
-
Navigate to the SageMaker Assets page.
-
Choose Manage asset assets.
-
Select Incoming subscription requests.
-
Select the Approved tab.
-
Choose Revoke next to the asset.
You can also unpublish assets, making them only show up as owned assets. The assets won't be visible in the resouce catalog, but the individuals whose subscription requests you've approved can still access them.
To unpublish an asset
-
Navigate to the SageMaker Assets page.
-
Under Owned assets, select the asset that you're unpublishing.
-
Choose Unpublish.
You can also delete assets from the same page where you unpublish them. Deleting an asset doesn't delete the source of data. Asset deletion only makes the asset invisible to the other members of your project or organization.
Step 4: Find assets and request access to them
You can request access to assets that other users have published to the resource catalog. If they approve the subscription request, you get access to the underlying source of data.
At the top of the SageMaker Assets page, you can specify a search query to find assets that other users in your organization have published. You can also select an asset type to view all the published assets of that type. For example, you can select Glue Table to view all of the published AWS Glue tables.
You can also view the asset type directly under the name of the asset. The following are the available names for the asset types:
-
Redshift table
-
Glue table
-
Models
-
Feature group
Note
Feature groups in the following stores have the type of Glue table:
-
Offline
-
Offline and online
To make a subscription request
-
Navigate to the SageMaker Assets page.
-
-
In the search bar, specify the name of the asset and choose Search.
-
For Types, select the asset type and find an asset that you're accessing within the resource catalog.
-
-
Choose the asset.
-
Choose Subscribe.
-
Provide a reason for the request.
-
Choose Submit.
Your subscription request appears under Outgoing subscription requests under Manage asset requests. If the publisher of the asset approves your request, it appears under Subscribed assets. You can now use the Amazon Redshift, AWS Glue table, or ML source of data in your machine learning workflows.
Step 5: Use a shared asset in your machine learning workflows
If your subscription request to an asset is approved, you can use it in your machine learning workflows.
The feature groups to which you've been given access appear in your list of feature groups in Studio.
The model groups to which you've been given access appear in your list of model groups in Studio. You can open your model group in model registry from SageMaker Assets. Use the following procedure to open the model group within model registry. Subscribed assets.
To open a model group from SageMaker Assets
-
Select the model group.
-
Choose Open in Model Registry.
You can access AWS Glue or Amazon Redshift tables in Data Wrangler within SageMaker Canvas. SageMaker Canvas is an application that lets you perform exploratory data analysis (EDA) and train models without code. For more information about SageMaker Canvas, see Amazon SageMaker Canvas.
You can also bring the data from your AWS Glue or Amazon Redshift tables into your Jupyter notebooks by using the SQL extension. You can convert your data into pandas dataframes for your machine learning workflows. For more information, see Data preparation with SQL in Studio.