Updating existing bulk data
If you previously created a dataset import job for a dataset, you can use another import job to add to or replace the existing bulk data. By default, a dataset import job replaces any existing data in the dataset that you imported in bulk. You can instead append the new records to existing data by changing the job's import mode. To append data to an Interactions dataset with a dataset import job, you must have at minimum 1000 new interaction records.
Filter updates for bulk records
Within 15 minutes of completing a bulk import, Amazon Personalize updates any filters you created in the dataset group with your new item and user data. This update allows Amazon Personalize to use the most recent data when filtering recommendations for your users.
How new bulk records influence recommendations
If you have already created a recommender or custom solution version, new records influence recommendations as follows:
-
For bulk interactions, you must wait for your recommender to update (for Domain dataset groups) or create a new custom solution version.
-
For bulk item data, if you created the recommender with Top picks for you and Recommended for you or trained the solution version with User-Personalization and deployed it in a campaign, Amazon Personalize automatically updates the model every two hours. After each update, the new items might be included in recommendations with exploration. For information about exploration see Automatic updates.
For any other use case or recipe, you must retrain the model for the new items to be included in recommendations.
-
For new users without interactions data, recommendations are initially for only popular items. If you have metadata about the user in a Users dataset and you choose a use case or recipe that uses metadata, such as Top picks for you or Recommended for you (use cases) or User-Personalization or Personalized-Ranking (recipes), these popular items might be more relevant for the user.
To get more relevant recommendations for a new user, you can import more interactions data for the user.
Topics
Import modes
To configure how Amazon Personalize adds your new records to your dataset, you specify an import mode for your dataset import job:
-
To overwrite all existing bulk data in your dataset, choose Replace existing data in the Amazon Personalize console or specify FULL in the CreateDatasetImportJob API operation. This doesn't replace data you imported individually, including events recorded in real time.
-
To append the records to the existing data in your dataset, choose Add to existing data or specify
INCREMENTAL
in the CreateDatasetImportJob API operation. Amazon Personalize replaces any record with the same ID with the new one.To append data to an Interactions dataset with a dataset import job, you must have at minimum 1000 new interaction records.
If you haven't imported bulk records, the option is not available in the
console and you can only specify FULL
in the
CreateDatasetImportJob API operation. The default
is a full replacement.
Updating bulk records (console)
Important
By default, a dataset import job replaces any existing data in the dataset that you imported in bulk. You can change this by specifying the job's import mode.
To update bulk data with the Amazon Personalize console, create a dataset import job for the dataset and specify an import mode.
To update bulk records (console)
-
Open the Amazon Personalize console at https://console.aws.amazon.com/personalize/home
and sign in to your account. -
On the Dataset groups page, choose your dataset group.
-
From the navigation pane, choose Datasets.
-
On the Datasets page, choose the dataset you want to update.
-
In Dataset import jobs, choose Create dataset import job.
-
In Import job details, for Dataset import job name, specify a name for your import job.
-
For Import mode, choose how to update the dataset. Choose either Replace existing data or Add to existing data. data. For more information see Import modes.
-
In Input source, for S3 Location, specify where your data file is stored in Amazon S3. Use the following syntax:
s3://<name of your S3 bucket>/<folder path>/<CSV file name>
If your CSV files are in a folder in your S3 bucket and you want to upload multiple CSV files to a dataset with one dataset import job, use this syntax without the CSV file name.
-
In IAM role, choose to either create a new role or use an existing one. If you completed the prerequisites, choose Use an existing service role and specify the role that you created in Creating an IAM role for Amazon Personalize.
-
For Tags, optionally add any tags. For more information about tagging Amazon Personalize resources, see Tagging Amazon Personalize resources.
-
Choose Finish. The data import job starts and the Dataset overview page displayed. The dataset import is complete when the status is ACTIVE.
Updating bulk records (AWS CLI)
Important
By default, a dataset import job replaces any existing data in the dataset that you imported in bulk. You can change this by specifying the job's import mode.
To update bulk data, use the create-dataset-import-job
command. For the import-mode
, specify FULL
to replace existing data or INCREMENTAL
to add to it.
For more information see Import modes.
The following code shows how to create a dataset import job that incrementally imports data into a dataset.
aws personalize create-dataset-import-job \ --job-name
dataset import job name
\ --dataset-arndataset arn
\ --data-source dataLocation=s3://bucketname
/filename
\ --role-arnroleArn
\ --import-modeINCREMENTAL
Updating bulk records (AWS SDKs)
Important
By default, a dataset import job replaces any existing data in the dataset that you imported in bulk. You can change this by specifying the job's import mode.
To update bulk data, create a dataset import job and specify an import mode. The following code show's how to update bulk data in Amazon Personalize with the SDK for Python (Boto3) or SDK for Java 2.x.