Create Feature Groups - Amazon SageMaker

Create Feature Groups

A FeatureGroup is the main Feature Store resource that contains the metadata for all the data stored in Amazon SageMaker Feature Store. A feature group is a logical grouping of features, defined in the feature store, to describe records. A feature group’s definition is composed of a list of feature definitions, a record identifier name, and configurations for its online and offline store. The example code in this topic uses the SageMaker Python SDK. The underlying APIs are available for developers using other languages.

Prior to using a feature store you typically load your dataset, run transformations, and set up your features for ingestion. This process has a lot of variation and is highly dependent on your data. The example code in the following topics refer to the Introduction to Feature Store, Fraud Detection with Amazon SageMaker FeatureStore example notebooks respectively. We recommend that you run this notebook in Amazon SageMaker Studio because the code in this guide is conceptual and not fully functional if copied.

Feature Store supports the following data types: String, Fractional (IEEE 64-bit floating point value), and Integral (Int64 - 64 bit signed integral value). The default type is set to String. This means that, if a column in your dataset is not a float or long type, it defaults to String in your feature store.

You may use a schema to describe your data’s columns and data types. You pass this schema into FeatureDefinitions, a required parameter for a FeatureGroup. You can use the SageMaker Python SDK, which has automatic data type detection when you use the load_feature_definitions function. 

The default behavior when a new feature record is added with an already existing record ID is as follows. In the offline store, the new record will be appended. In the online store, if the event time of the new record is less than the existing event time than nothing will happen, however if the event time of the new record is greater than or equal to the existing event time, the record will be over written.