Object2Vec Algorithm
The Amazon SageMaker Object2Vec algorithm is a generalpurpose neural embedding algorithm that is highly customizable. It can learn lowdimensional dense embeddings of highdimensional objects. The embeddings are learned in a way that preserves the semantics of the relationship between pairs of objects in the original space in the embedding space. You can use the learned embeddings to efficiently compute nearest neighbors of objects and to visualize natural clusters of related objects in lowdimensional space, for example. You can also use the embeddings as features of the corresponding objects in downstream supervised tasks, such as classification or regression.
Object2Vec generalizes the wellknown Word2Vec embedding technique for words that is
optimized in the SageMaker BlazingText algorithm.
For
a blog post that discusses how to apply Object2Vec to some practical use cases, see Introduction to Amazon SageMaker
Object2Vec
Topics
 I/O Interface for the Object2Vec Algorithm
 EC2 Instance Recommendation for the Object2Vec Algorithm
 Object2Vec Sample Notebooks
 How Object2Vec Works
 Object2Vec Hyperparameters
 Tune an Object2Vec Model
 Data Formats for Object2Vec Training
 Data Formats for Object2Vec Inference
 Encoder Embeddings for Object2Vec
I/O Interface for the Object2Vec Algorithm
You can use Object2Vec on many input data types, including the following examples.
Input Data Type  Example 

Sentencesentence pairs 
"A soccer game with multiple males playing." and "Some men are playing a sport." 
Labelssequence pairs 
The genre tags of the movie "Titanic", such as "Romance" and "Drama", and its short description: "James Cameron's Titanic is an epic, actionpacked romance set against the illfated maiden voyage of the R.M.S. Titanic. She was the most luxurious liner of her era, a ship of dreams, which ultimately carried over 1,500 people to their death in the ice cold waters of the North Atlantic in the early hours of April 15, 1912." 
Customercustomer pairs 
The customer ID of Jane and customer ID of Jackie. 
Productproduct pairs 
The product ID of football and product ID of basketball. 
Item review useritem pairs 
A user's ID and the items she has bought, such as apple, pear, and orange. 
To transform the input data into the supported formats, you must preprocess it. Currently, Object2Vec natively supports two types of input:

A discrete token, which is represented as a list of a single
integerid
. For example,[10]
. 
A sequences of discrete tokens, which is represented as a list of
integerids
. For example,[0,12,10,13]
.
The object in each pair can be asymmetric. For example, the pairs can be (token, sequence) or (token, token) or (sequence, sequence). For token inputs, the algorithm supports simple embeddings as compatible encoders. For sequences of token vectors, the algorithm supports the following as encoders:

Averagepooled embeddings

Hierarchical convolutional neural networks (CNNs),

Multilayered bidirectional long shortterm memory (BiLSTMs)
The input label for each pair can be one of the following:

A categorical label that expresses the relationship between the objects in the pair

A score that expresses the strength of the similarity between the two objects
For categorical labels used in classification, the algorithm supports the
crossentropy loss function. For ratings/scorebased labels used in regression, the
algorithm supports the mean squared error (MSE) loss function. Specify these loss
functions with the output_layer
hyperparameter when you create the model
training job.
EC2 Instance Recommendation for the Object2Vec Algorithm
The type of Amazon Elastic Compute Cloud (Amazon EC2) instance that you use depends on whether you are training or running inference.
When training a model using the Object2Vec algorithm on a CPU, start with an ml.m5.2xlarge instance. For training on a GPU, start with an ml.p2.xlarge instance. If the training takes too long on this instance, you can use a larger instance. Currently, the Object2Vec algorithm can train only on a single machine. However, it does offer support for multiple GPUs. Object2Vec supports P2, P3, G4dn, and G5 GPU instance families for training and inference.
For inference with a trained Object2Vec model that has a deep neural network, we
recommend using ml.p3.2xlarge GPU instance. Due to GPU memory scarcity, the
INFERENCE_PREFERRED_MODE
environment variable can be specified to
optimize on whether the GPU
optimization: Classification or Regression or GPU
optimization: Encoder Embeddings inference network is loaded into
GPU.
Object2Vec Sample Notebooks
Note
To run the notebooks on a notebook instance, see Example Notebooks. To run the notebooks on Studio, see Create or Open an Amazon SageMaker Studio Classic Notebook.