Use Amazon SageMaker Feature Store with Amazon SageMaker Studio
You can use Amazon SageMaker Studio to create and view details about your feature groups.
Topics
Create a feature group in Studio
The create feature group process has four steps:
-
Enter feature group information.
-
Enter feature definitions.
-
Enter required features.
-
Enter feature group tags.
Consider which of the following options best fits your use case:
-
Create an online store, an offline store, or both. For more information on the differences between online and offline stores, see Feature Store concepts.
-
Use a default AWS Key Management Service key or your own KMS key. The default key is AWS KMS key (SSE-KMS). You can reduce AWS KMS request costs by configuring use of Amazon S3 Bucket Keys on the offline store Amazon S3 bucket. The Amazon S3 Bucket Key needs to be enabled before using the bucket for your feature groups. For more information about reducing the cost by using Amazon S3 bucket keys, see Reducing the cost of SSE-KMS with Amazon S3 Bucket Keys.
You can use the same key for both online and offline stores, or have a unique key for each. For more information on AWS KMS, see AWS Key Management Service.
-
If you create an offline store:
-
Decide if you want to create an Amazon S3 bucket or use an existing one. When using an existing one, you need to know the Amazon S3 bucket URL or Amazon S3 bucket name and dataset directory name, if applicable.
-
Choose which IAM role ARN to use. For more information on how to find your role and attached policies, see Adding policies to your IAM role.
-
Decide whether to use the AWS Glue (default) or Apache Iceberg table format. In most use cases, you use the Apache Iceberg table format. For more information on table formats, see Use Feature Store with SDK for Python (Boto3).
-
Steps to create a feature group using Studio
-
Open Studio. For more information, see Launch Amazon SageMaker Studio.
-
Choose the Home icon (
) on the left panel.
-
Choose Data.
-
From the dropdown list, choose Feature Store.
-
Choose Create feature group.
-
Under Feature group details, enter a feature group name.
-
(Optional) Enter a description of the feature group.
-
Under Feature group storage configuration, choose a storage type from the Storage type dropdown list.
If you choose offline storage:
-
From the Amazon S3 bucket name dropdown list, choose an existing Amazon S3 bucket name, enter a new bucket name, or choose Enter bucket URL manually and enter the URL under Amazon S3 bucket address.
-
(Optional) If you have a specified directory name for your dataset, choose from the Dataset directory name dropdown list.
-
From the Table format dropdown list, choose the table format. In most use cases, you should use the Apache Iceberg table format. For more information on table formats, see Use Feature Store with SDK for Python (Boto3).
-
Under IAM role ARN, choose the IAM role ARN you want to attach to this feature group. For more information on how to find your role and attached policies, see Adding policies to your IAM role.
-
-
Under the Online store encryption key or Offline store encryption key dropdown list, choose Use AWS managed AWS KMS key (default) or Enter a AWS KMS key ARN and enter your AWS KMS key ARN under Offline store encryption key ARN. For more information about AWS KMS, see AWS Key Management Service.
-
(Optional) If you have chosen the online storage Storage type, you can choose to apply Time to Live (TTL) by toggling the switch to On and specifying the Time to Live duration value and unit. This will update the default TTL duration for all records added to the feature group after the feature group is created.
-
If you have chosen the offline storage Table format and AWS Glue (default) Table format, under Data catalog, you have the option to choose Use default values for your AWS Glue data catalog or provide your existing data catalog name, table name, and database name to extend your existing AWS Glue catalog.
-
Once all of the required information has been specified, the Continue button is available. Choose Continue.
-
Under Specify feature definitions, you have two options for providing a schema for your features: a JSON editor, or a table editor. In the JSON tab, type in or copy and paste your feature definitions in the JSON format. For the table editor, type in the name and choose the corresponding data type for each feature in your feature group. Choose + Add feature definitions to include more features. Be aware that you cannot remove feature definitions from your feature groups, but you can add and update feature definitions after the feature group is created.
There must be at least two features in a feature group representing the record identifier and event time:
-
The record Type can be a string, fractional, or an integral.
-
The event time Type must be a string or a fractional. However, if you chose the Iceberg table format, the event time must be a string.
-
-
Once all of the features are included, choose Continue.
-
Under Select required features you must specify the record identifier and event time features by choosing the feature name under Record identifier feature name and Event time feature name dropdown lists, respectively.
-
Once the record identifier and event time features are chosen, choose Continue.
-
(Optional) Add tags for the feature group by first choosing Add new tag and then entering a tag key and corresponding value under Key and Value, respectively.
-
Choose Continue.
-
Under Review feature group, review the feature group information. You may edit any step by choosing the Edit button that corresponds to that step. This brings you to the corresponding step for editing. To return to step 5, choose Continue until you return to step 5.
-
Once you have finalized the setup for your feature group, choose Create feature group.
If there are any issues with the setup, there is a red alert pop-up message that appears at the bottom of the page with tips on solving the issue. You can return to previous steps to fix them.
If the feature group has been successfully created, a green pop-up message appears at the bottom of the page. When the feature group is successfully created, it appears in your feature groups catalog.
View feature group details in Studio
You can view details of your feature groups once a feature group has successfully been created in the Feature Store.
-
Open Studio. For more information, see Launch Amazon SageMaker Studio.
-
Choose the Home icon (
) on the left panel.
-
Choose Data.
-
From the dropdown list, choose Feature Store.
-
Under the Feature group catalog tab, choose your feature group name from the list. This opens the feature group page.
-
Under the Details tab and the Information sub-tab, you can review your feature group information, including and not limited to Latest execution, Offline storage settings, and Online storage settings.
-
Under the Details tab and the Tags sub-tab, you can review your feature group tags. Choose Add new tag to add a new tag or Remove to remove a tag.
-
On the Features tab, you can find a list of all of the features. Use the filter to refine your list. Choose a feature to view its details.
Update feature group in Studio
You can update your feature groups once a feature group has successfully been created in the Feature Store.
-
Open Studio. For more information, see Launch Amazon SageMaker Studio.
-
Choose the Home icon (
) on the left panel.
-
Choose Data.
-
From the dropdown list, choose Feature Store.
-
Under the Feature group catalog tab, search for and choose your feature group name from the list. This opens the feature group page.
-
Choose Update feature group.
-
(Optional) If your feature group uses the online store, you can update the default Time to Live (TTL). If TTL hasn't been enabled for the feature group, toggle the switch button under Time to Live (TTL) to On. You can specify the TTL value and unit under Time to Live duration. This will update the default TTL duration for all records added to the feature group after the feature group is updated.
-
(Optional) You can add feature definitions to your feature group but be aware that you cannot remove feature definitions from your feature groups. To add a feature definition, choose + Add feature definition and then specify the new feature definition name under the Name column and select the feature type under the Type column.
-
Choose Save changes.
-
To confirm your changes choose Confirm.
View pipeline executions in Studio
You can view the latest pipeline execution information for a feature or feature group under Pipeline executions, including quick links to pipelines, executions, code, and other useful execution information.
-
Open Studio. For more information, see Launch Amazon SageMaker Studio.
-
Choose the Home icon (
) on the left panel.
-
Choose Data.
-
From the dropdown list, choose Feature Store.
-
Choose a feature group or feature you wish to see the pipeline execution for.
-
Choose the Pipeline executions tab.
-
Search for a pipeline from the Select a pipeline dropdown list.
-
You can view the links for the pipeline, execution, and code details as well as view the execution owner, status, date, and duration.
View lineage in Studio
You can view the lineage of a feature group. The lineage includes the information about the execution code of your feature processing workflow, what data sources were used, and how they are ingested to the feature group or feature.
-
Open Studio. For more information, see Launch Amazon SageMaker Studio.
-
Choose the Home icon (
) on the left panel.
-
Choose Data.
-
From the dropdown list, choose Feature Store.
-
Choose a feature group or feature you wish to see the lineage for.
-
Choose the Lineage tab.
-
Choose a feature group or pipeline node to expand the node. This contains more information about a feature group or pipeline.
-
You can zoom in, zoom out, or recenter the lineage graph by using the buttons on the bottom left of the screen.
-
You can navigate the lineage map when you press Tab or Shift+Tab to switch between nodes, when you choose nodes, or when you choose and drag the screen.
-
If applicable, you can navigate the lineage upstream (left, earlier) or downstream (right, most recent) by choosing a node and then choosing Query upstream lineage or Query downstream lineage.