Storing event data using batch import - Amazon Fraud Detector

Storing event data using batch import

With the batch import feature, you can quickly and easily upload large historical event datasets in Amazon Fraud Detector using the console, the API, or the AWS SDK. To use batch import, create an input file in CSV format that contains all your event data, upload the CSV file onto Amazon S3 bucket, and start an Import job. Amazon Fraud Detector first validates the data based on the event type, and then automatically imports the entire dataset. After the data is imported, it’s training ready to be used for training new models or for re-training existing models.

Input and output files

The input CSV file must contain headers that match the variables defined in the associated event type plus four mandatory variables. See Preparing event data for storage for more information. The maximum size of the input data file is 1 Gigabytes (GB), or about 3 million events. The number of events will vary by your event size. If the import job was successful, the output file is empty. If the import was unsuccessful, the output file contains the error logs.

Upload CSV file to Amazon S3 for batch import

After you create a CSV file with your data, upload the file to your Amazon Simple Storage Service (Amazon S3) bucket.

To upload event data to an Amazon S3 bucket

  1. Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.

  2. Choose Create bucket.

    The Create bucket wizard opens.

  3. In Bucket name, enter a DNS-compliant name for your bucket.

    The bucket name must:

    • Be unique across all of Amazon S3.

    • Be between 3 and 63 characters long.

    • Not contain uppercase characters.

    • Start with a lowercase letter or number.

    After you create the bucket, you can't change its name. For information about naming buckets, see Bucket naming rules in the Amazon Simple Storage Service User Guide.

    Important

    Avoid including sensitive information, such as account numbers, in the bucket name. The bucket name is visible in the URLs that point to the objects in the bucket.

  4. In Region, choose the AWS Region where you want the bucket to reside. You must select the same Region in which you are using Amazon Fraud Detector, that is US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Asia Pacific (Singapore) or Asia Pacific (Sydney).

  5. In Bucket settings for Block Public Access, choose the Block Public Access settings that you want to apply to the bucket.

    We recommend that you leave all settings enabled. For more information about blocking public access, see Blocking public access to your Amazon S3 storage in the Amazon Simple Storage Service User Guide.

  6. Choose Create bucket.

  7. Upload training data file to your Amazon S3 bucket. Note the Amazon S3 location path for your training file (for example, s3://bucketname/object.csv).

Batch import event data

You can easily import large number of your event datasets in Amazon Fraud Detector console, using the CreateBatchImportJob API or using AWS SDK. Before you proceed, make sure that you have followed instructions to prepare your dataset as a CSV file. Make sure that you also uploaded the CSV file to an Amazon S3 bucket.

Using Amazon Fraud Detector console

To batch import event data in console

  1. Open the AWS Console and sign in to your account, and navigate to Amazon Fraud Detector.

  2. In the left navigation pane, choose Events.

  3. Choose your event type.

  4. Select Stored events tab.

  5. In the Stored events details pane, make sure that the Event ingestion is ON.

  6. In the Import events data pane, choose New Import.

  7. In the New events import page, provide the following information:

    • For IAM role for data, select the IAM role that you created for the Amazon S3 bucket that holds the CSV file you are planning to import.

    • For Input data location, enter the S3 location where you have your CSV file.

    • If you want to specify a separate location to store your import results, click Separate data location for inputs and results button and provide a valid Amazon S3 bucket location.

  8. Choose Start.

  9. The Status column in Import events data pane displays the status of your import job.

  10. Choose Job Id of an import job to view details.

Note

We recommend waiting 10 minutes after you’ve finished importing events data into AFD to ensure that they are fully ingested by the system.

Batch import event data using the AWS SDK for Python (Boto3)

The following example shows a sample request for CreateBatchImportJob API. A batch import job must include a jobID, inputPath, outputPath, eventTypeName and iamRoleArn. The jobID can’t contain the same ID of a past job, unless the job exists in CREATE_FAILED state. The inputPath and outputPath must be valid S3 paths. You can opt out of specifying the file name in the outputPath, however, you will still need to provide a valid S3 bucket location. The eventTypeName and iamRoleArn must exist.

import boto3 fraudDetector = boto3.client('frauddetector') fraudDetector.create_batch_import_job ( jobId = 'sample_batch_import', inputPath = 's3://bucket_name/input_file_name.csv', outputPath = 's3://bucket_name/', eventTypeName = 'sample_registration', "iamRoleArn": 'arn:aws:iam::************:role/service-role/AmazonFraudDetector-DataAccessRole-*************' )

Cancelling batch import job

You can cancel an in-progress batch import job at any time in the Amazon Fraud Detector console, using the CancelBatchImportJob API, or AWS SDK.

To cancel a batch import job in console,

  1. Open the AWS Console and sign in to your account, and navigate to Amazon Fraud Detector.

  2. In the left navigation pane, choose Events.

  3. Choose your event type.

  4. Select Stored events tab.

  5. In the Import events data pane, choose the job Id of an in-progress import job you want to cancel.

  6. In the event job page, click Actions and select Cancel events import.

  7. Choose Stop events import to cancel the batch import job.

Canceling batch import job using the AWS SDK for Python (Boto3)

The following example shows a sample request for the CancelBatchImportJob API. The cancel import job must include the job ID of an in-progress batch import job.

import boto3 fraudDetector = boto3.client('frauddetector') fraudDetector.cancel_batch_import_job ( jobId = 'sample_batch' )