Updating existing bulk data
If you previously created a dataset import job for a dataset, you can use another import job to add to or replace the existing bulk data. By default, a dataset import job replaces any existing data in the dataset that you imported in bulk. You can instead append the new records to existing data by changing the job's import mode.
To append data to an Item interactions dataset or Action interactions dataset with a dataset import job, you must have at minimum 1000 new item interaction or action interaction records.
If you already created a recommender or deployed a custom solution version with a campaign, how new bulk records influence recommendations depends on the domain use case or recipe that you use. For more information, see How new data influences real-time recommendations.
Filter updates for bulk records
Within 20 minutes of completing a bulk import, Amazon Personalize updates any filters you created in the dataset group with your new bulk data. This update allows Amazon Personalize to use the most recent data when filtering recommendations for your users.
Topics
Import modes
To configure how Amazon Personalize adds your new records to your dataset, you specify an import mode for your dataset import job:
-
To overwrite all existing bulk data in your dataset, choose Replace existing data in the Amazon Personalize console or specify
FULL
in the CreateDatasetImportJob API operation. This doesn't replace data you imported individually, including events recorded in real time. -
To append the records to the existing data in your dataset, choose Add to existing data or specify
INCREMENTAL
in theCreateDatasetImportJob
API operation. Amazon Personalize replaces any record with the same ID with the new one.To append data to an Item interactions dataset or Action interactions dataset with a dataset import job, you must have at minimum 1000 new item interaction or action interaction records.
If you haven't imported bulk records, the Add to existing data option is not available in the
console and you can only specify FULL
in the
CreateDatasetImportJob
API operation. The default
is a full replacement.
Updating bulk records (console)
Important
By default, a dataset import job replaces any existing data in the dataset that you imported in bulk. You can change this by specifying the job's import mode.
To update bulk data with the Amazon Personalize console, create a dataset import job for the dataset and specify an import mode.
To update bulk records (console)
-
Open the Amazon Personalize console at https://console.aws.amazon.com/personalize/home
and sign in to your account. -
On the Dataset groups page, choose your dataset group.
-
From the navigation pane, choose Datasets.
-
On the Datasets page, choose the dataset you want to update.
-
In Dataset import jobs, choose Create dataset import job.
-
In Import job details, for Dataset import job name, specify a name for your import job.
-
For Import mode, choose how to update the dataset. Choose either Replace existing data or Add to existing data. data. For more information see Import modes.
-
In Input source, for S3 Location, specify where your data file is stored in Amazon S3. Use the following syntax:
s3://<name of your S3 bucket>/<folder path>/<CSV file name>
If your CSV files are in a folder in your S3 bucket and you want to upload multiple CSV files to a dataset with one dataset import job, use this syntax without the CSV file name.
-
In IAM role, choose to either create a new role or use an existing one. If you completed the prerequisites, choose Use an existing service role and specify the role that you created in Creating an IAM role for Amazon Personalize.
-
For Tags, optionally add any tags. For more information about tagging Amazon Personalize resources, see Tagging Amazon Personalize resources.
-
Choose Finish. The data import job starts and the Dataset overview page displayed. The dataset import is complete when the status is ACTIVE.
Updating bulk records (AWS CLI)
Important
By default, a dataset import job replaces any existing data in the dataset that you imported in bulk. You can change this by specifying the job's import mode.
To update bulk data, use the create-dataset-import-job
command. For the import-mode
, specify FULL
to replace existing data or INCREMENTAL
to add to it.
For more information see Import modes.
The following code shows how to create a dataset import job that incrementally imports data into a dataset.
aws personalize create-dataset-import-job \ --job-name
dataset import job name
\ --dataset-arndataset arn
\ --data-source dataLocation=s3://bucketname
/filename
\ --role-arnroleArn
\ --import-modeINCREMENTAL
Updating bulk records (AWS SDKs)
Important
By default, a dataset import job replaces any existing data in the dataset that you imported in bulk. You can change this by specifying the job's import mode.
To update bulk data, create a dataset import job and specify an import mode. The following code show's how to update bulk data in Amazon Personalize with the SDK for Python (Boto3) or SDK for Java 2.x.