Updating existing bulk data - Amazon Personalize

Updating existing bulk data

If you previously created a dataset import job for a dataset, you can use another import job to add to or replace the existing bulk data. By default, a dataset import job replaces any existing data in the dataset that you imported in bulk. You can instead append the new records to existing data by changing the job's import mode.

To append data to an Item interactions dataset or Action interactions dataset with a dataset import job, you must have at minimum 1000 new item interaction or action interaction records.

If you already created a recommender or deployed a custom solution version with a campaign, how new bulk records influence recommendations depends on the domain use case or recipe that you use. For more information, see How new data influences real-time recommendations.

Filter updates for bulk records

Within 20 minutes of completing a bulk import, Amazon Personalize updates any filters you created in the dataset group with your new bulk data. This update allows Amazon Personalize to use the most recent data when filtering recommendations for your users.

Import modes

To configure how Amazon Personalize adds your new records to your dataset, you specify an import mode for your dataset import job:

  • To overwrite all existing bulk data in your dataset, choose Replace existing data in the Amazon Personalize console or specify FULL in the CreateDatasetImportJob API operation. This doesn't replace data you imported individually, including events recorded in real time.

  • To append the records to the existing data in your dataset, choose Add to existing data or specify INCREMENTAL in the CreateDatasetImportJob API operation. Amazon Personalize replaces any record with the same ID with the new one.

    To append data to an Item interactions dataset or Action interactions dataset with a dataset import job, you must have at minimum 1000 new item interaction or action interaction records.

If you haven't imported bulk records, the Add to existing data option is not available in the console and you can only specify FULL in the CreateDatasetImportJob API operation. The default is a full replacement.

Updating bulk records (console)

Important

By default, a dataset import job replaces any existing data in the dataset that you imported in bulk. You can change this by specifying the job's import mode.

To update bulk data with the Amazon Personalize console, create a dataset import job for the dataset and specify an import mode.

To update bulk records (console)
  1. Open the Amazon Personalize console at https://console.aws.amazon.com/personalize/home and sign in to your account.

  2. On the Dataset groups page, choose your dataset group.

  3. From the navigation pane, choose Datasets.

  4. On the Datasets page, choose the dataset you want to update.

  5. In Dataset import jobs, choose Create dataset import job.

  6. In Import job details, for Dataset import job name, specify a name for your import job.

  7. For Import mode, choose how to update the dataset. Choose either Replace existing data or Add to existing data. data. For more information see Import modes.

  8. In Input source, for S3 Location, specify where your data file is stored in Amazon S3. Use the following syntax:

    s3://<name of your S3 bucket>/<folder path>/<CSV file name>

    If your CSV files are in a folder in your S3 bucket and you want to upload multiple CSV files to a dataset with one dataset import job, use this syntax without the CSV file name.

  9. In IAM role, choose to either create a new role or use an existing one. If you completed the prerequisites, choose Use an existing service role and specify the role that you created in Creating an IAM role for Amazon Personalize.

  10. For Tags, optionally add any tags. For more information about tagging Amazon Personalize resources, see Tagging Amazon Personalize resources.

  11. Choose Finish. The data import job starts and the Dataset overview page displayed. The dataset import is complete when the status is ACTIVE.

Updating bulk records (AWS CLI)

Important

By default, a dataset import job replaces any existing data in the dataset that you imported in bulk. You can change this by specifying the job's import mode.

To update bulk data, use the create-dataset-import-job command. For the import-mode, specify FULL to replace existing data or INCREMENTAL to add to it. For more information see Import modes.

The following code shows how to create a dataset import job that incrementally imports data into a dataset.

aws personalize create-dataset-import-job \ --job-name dataset import job name \ --dataset-arn dataset arn \ --data-source dataLocation=s3://bucketname/filename \ --role-arn roleArn \ --import-mode INCREMENTAL

Updating bulk records (AWS SDKs)

Important

By default, a dataset import job replaces any existing data in the dataset that you imported in bulk. You can change this by specifying the job's import mode.

To update bulk data, create a dataset import job and specify an import mode. The following code show's how to update bulk data in Amazon Personalize with the SDK for Python (Boto3) or SDK for Java 2.x.

SDK for Python (Boto3)

To update bulk data, use the create_dataset_import_job method. For the import-mode, specify FULL to replace existing data or INCREMENTAL to add to it. For more information see Import modes.

import boto3 personalize = boto3.client('personalize') response = personalize.create_dataset_import_job( jobName = 'YourImportJob', datasetArn = 'dataset_arn', dataSource = {'dataLocation':'s3://bucket/file.csv'}, roleArn = 'role_arn', importMode = 'INCREMENTAL' )
SDK for Java 2.x

To update bulk data, use the following createPersonalizeDatasetImportJob method. For the importImport, specify ImportMode.FULL to replace existing data or ImportMode.INCREMENTAL to add to it. For more information see Import modes.

public static String createPersonalizeDatasetImportJob(PersonalizeClient personalizeClient, String jobName, String datasetArn, String s3BucketPath, String roleArn, ImportMode importMode) { long waitInMilliseconds = 60 * 1000; String status; String datasetImportJobArn; try { DataSource importDataSource = DataSource.builder() .dataLocation(s3BucketPath) .build(); CreateDatasetImportJobRequest createDatasetImportJobRequest = CreateDatasetImportJobRequest.builder() .datasetArn(datasetArn) .dataSource(importDataSource) .jobName(jobName) .roleArn(roleArn) .importMode(importMode) .build(); datasetImportJobArn = personalizeClient.createDatasetImportJob(createDatasetImportJobRequest) .datasetImportJobArn(); DescribeDatasetImportJobRequest describeDatasetImportJobRequest = DescribeDatasetImportJobRequest.builder() .datasetImportJobArn(datasetImportJobArn) .build(); long maxTime = Instant.now().getEpochSecond() + 3 * 60 * 60; while (Instant.now().getEpochSecond() < maxTime) { DatasetImportJob datasetImportJob = personalizeClient .describeDatasetImportJob(describeDatasetImportJobRequest) .datasetImportJob(); status = datasetImportJob.status(); System.out.println("Dataset import job status: " + status); if (status.equals("ACTIVE") || status.equals("CREATE FAILED")) { break; } try { Thread.sleep(waitInMilliseconds); } catch (InterruptedException e) { System.out.println(e.getMessage()); } } return datasetImportJobArn; } catch (PersonalizeException e) { System.out.println(e.awsErrorDetails().errorMessage()); } return ""; }
SDK for JavaScript v3
// Get service clients and commands using ES6 syntax. import { CreateDatasetImportJobCommand, PersonalizeClient } from "@aws-sdk/client-personalize"; // create personalizeClient const personalizeClient = new PersonalizeClient({ region: "REGION" }); // Set the dataset import job parameters. export const datasetImportJobParam = { datasetArn: 'DATASET_ARN', /* required */ dataSource: { dataLocation: 's3://<name of your S3 bucket>/<folderName>/<CSVfilename>.csv' /* required */ }, jobName: 'NAME', /* required */ roleArn: 'ROLE_ARN', /* required */ importMode: "INCREMENTAL" /* optional, default is FULL */ }; export const run = async () => { try { const response = await personalizeClient.send(new CreateDatasetImportJobCommand(datasetImportJobParam)); console.log("Success", response); return response; // For unit tests. } catch (err) { console.log("Error", err); } }; run();