MLCOST-08: Enable feature reusability
Reduce duplication and the rerunning of feature engineering code across teams and projects by using feature storage. The store should have online and offline storage, and data encryption capabilities. An online store with low-latency retrieval capabilities is ideal for real-time inference. An offline store maintains a history of feature values and is suited for training and batch scoring.
Implementation plan
-
Use Amazon SageMaker AI Feature Store - Amazon SageMaker AI Feature Store is a fully managed, purpose-built repository to store, update, retrieve, and share ML features. Feature Store makes it easy for data scientists, machine learning engineers, and general practitioners to create, share, and manage features for ML development. The online store is used for low latency, real-time inference use cases. The offline store is used for training and batch inference. The Feature Store reduces the repetitive data processing and curation work required to convert raw data into features for training an ML algorithm.
You can use Feature Store in the following modes:
-
Online - Features are read with low latency reads (milliseconds) and used for high throughput predictions.
-
Offline - Large streams of data are fed to an offline store, which is used for training and batch inference. This mode requires a feature group to be stored in an offline store. The offline store uses your S3 bucket for storage and can also fetch data using Amazon Athena queries.
-
Online and offline - This includes both online and offline modes.
Documents
Blogs
-
Store, Discover, and Share Machine Learning Features with Amazon SageMaker AI Feature Store
-
Enable feature reuse across accounts and teams using Amazon SageMaker AI Feature Store
-
Understanding the key capabilities of Amazon SageMaker AI Feature Store
-
Using Amazon SageMaker AI Feature Store with streaming feature aggregation
-
Extend model lineage to include ML features using Amazon SageMaker AI Feature Store