Preparing event data for storage - Amazon Fraud Detector

Preparing event data for storage

You can store your historical events data in Amazon Fraud Detector. Event data is stored at the Event Type resource level, so all events of the same Event Type are stored in a single Event Type. The stored events can later be used to train a new model or re-train an existing model. When training a model using the stored event data, you can optionally specify a time range of events to limit the size of your training dataset.

Amazon Fraud Detector provides sample dataset you can optionally use for training your Transaction fraud insights model. If you want to use the sample dataset, go to Amazon Fraud Detector samples, download transaction_data_100k_full and then follow instructions to store the sample dataset either using the SendEvents API operation or using batch import.

Create a CSV file

Amazon Fraud Detector imports data only from files that are in the comma-separated values (CSV) format. The first row of your CSV file must contain column headers that exactly match the variables defined in the associated event type plus four mandatory variables: EVENT_ID, EVENT_TIMESTAMP, ENTITY_ID, and ENTITY_TYPE. You can also optionally include EVENT_LABEL and LABEL_TIMESTAMP (LABEL_TIMESTAMP is required if EVENT_LABEL is included).

Define mandatory variables

Mandatory variables are considered as event metadata and they must be specified in uppercase. Event metadata are automatically included for model training. The following table lists the mandatory variables, description of each variable, and required format for the variable.

Name Description Requirements

EVENT_ID

An identifier for the event. For example, if your event is an online transaction, the EVENT_ID might be the transaction reference number that was provided to your customer.

  • The EVENT_ID is required for batch import jobs.

  • It must be unique for that event.

  • It should represent information that’s meaningful to your business.

  • It must satisfy the regular expression pattern (for example, ^[0-9a-z_-]+$.)

  • We don’t recommend that you append a timestamp to the EVENT_ID. Doing so might cause issues when you update the event. This because you must provide the exact same EVENT_ID if you do this.

EVENT_TIMESTAMP

The timestamp of when the event occurred. The timestamp must be in ISO 8601 standard in UTC.

  • The EVENT_TIMESTAMP is required for batch import jobs.

  • It must be specified in one of the following formats:

    • %yyyy-%mm-%ddT%hh:%mm:%ssZ (ISO 8601 standard in UTC only with no milliseconds)

      Example: 2019-11-30T13:01:01Z

    • %yyyy/%mm/%dd %hh:%mm:%ss (AM/PM)

      Examples: 2019/11/30 1:01:01 PM, or 2019/11/30 13:01:01

    • %mm/%dd/%yyyy %hh:%mm:%ss

      Examples: 11/30/2019 1:01:01 PM, 11/30/2019 13:01:01

    • %mm/%dd/%yy %hh:%mm:%ss

      Examples: 11/30/19 1:01:01 PM, 11/30/19 13:01:01

  • Amazon Fraud Detector makes the following assumptions when parsing date/timestamp formats for event timestamps:

    • If you are using the ISO 8601 standard, it must be an exact match of the preceding specification

    • If you are using one of the other formats, there is additional flexibility:

      • For months and days, you can provide single or double digits. For example, 1/12/2019 is a valid date.

      • You do not need to include hh:mm:ss if you do not have them (that is, you can simply provide a date). You can also provide a subset of just the hour and minutes (for example, hh:mm). Just providing hour is not supported. Milliseconds are also not supported.

      • If you provide AM/PM labels, a 12-hour clock is assumed. If there is no AM/PM information, a 24-hour clock is assumed.

      • You can use “/” or “-” as delimiters for the date elements. “:” is assumed for the timestamp elements.

ENTITY_ID

An identifier for the entity performing the event.

  • ENTITY_ID is required for batch import jobs

  • It must follow the regular expression pattern: ^[0-9A-Za-z_.@+-]+$.

  • If the entity id isn’t available at the time of evaluation, specify the entity id as unknown.

ENTITY_TYPE

The entity that performs the event, such as a merchant or a customer

ENTITY_TYPE is required for batch import jobs

EVENT_LABEL

Classifies the event as fraudulent or legitimate

EVENT_LABEL is required if LABEL_TIMESTAMP is included

LABEL_TIMESTAMP

The timestamp when the event label was last populated or updated

  • LABEL_TIMESTAMP is required if EVENT_LABEL is included.

  • It must follow the timestamp format.

Event Timestamp formats for SendEvent API

If you are storing event data using SendEvent API, you must ensure that your event timestamp is in the required format. Amazon Fraud Detector supports the following date/timestamp formats:

  • %yyyy-%mm-%ddT%hh:%mm:%ssZ (ISO 8601 standard in UTC only with no milliseconds)

    Example: 2019-11-30T13:01:01Z

  • %yyyy/%mm/%dd %hh:%mm:%ss (AM/PM)

    Examples: 2019/11/30 1:01:01 PM, or 2019/11/30 13:01:01

  • %mm/%dd/%yyyy %hh:%mm:%ss

    Examples: 11/30/2019 1:01:01 PM, 11/30/2019 13:01:01

  • %mm/%dd/%yy %hh:%mm:%ss

    Examples: 11/30/19 1:01:01 PM, 11/30/19 13:01:01

Amazon Fraud Detector makes the following assumptions when parsing date/timestamp formats for event timestamps:

  • If you are using the ISO 8601 standard, it must be an exact match of the preceding specification

  • If you are using one of the other formats, there is additional flexibility:

    • For months and days, you can provide single or double digits. For example, 1/12/2019 is a valid date.

    • You do not need to include hh:mm:ss if you do not have them (that is, you can simply provide a date). You can also provide a subset of just the hour and minutes (for example, hh:mm). Just providing hour is not supported. Milliseconds are also not supported.

    • If you provide AM/PM labels, a 12-hour clock is assumed. If there is no AM/PM information, a 24-hour clock is assumed.

    • You can use “/” or “-” as delimiters for the date elements. “:” is assumed for the timestamp elements.

Validating stored data

When uploading events via the SendEvent or GetEventPrediction API operation, Amazon Fraud Detector validates the following:

  • The EventIngestion setting for that event type is ENABLED.

  • Event timestamps cannot be updated. An event with a repeated event ID and different EVENT_TIMESTAMP will be treated as an error.

  • Variable names and values match their expected format. For more information, see Create a variable

  • Required variables are populated with a value.

  • All event timestamps are not older than 18 months and are not in the future.

Updating event labels

You might need to add or update fraud labels for events that are already stored in Amazon Fraud Detector, such as when you perform an offline fraud investigation for an event and want to close the machine learning feed back loop. To update the label for an event that is already stored in Amazon Fraud Detector, use the UpdateEventLabel API operation. The following shows an example UpdateEventLabel API call.

import boto3 fraudDetector = boto3.client('frauddetector') fraudDetector.update_event_label( eventId = '802454d3-f7d8-482d-97e8-c4b6db9a0428', eventTypeName = 'sample_registration', assignedLabel = 'fraud', labelTimestamp = '2020-07-13T23:18:21Z' )