AWS Database Migration Service
User Guide (Version API Version 2016-01-01)

Data Validation Task Settings

You can ensure that your data was migrated accurately from the source to the target. If you enable validation for a task, AWS DMS begins comparing the source and target data immediately after a full load is performed for a table. For more information about task data validation, its requirements, the scope of its database support, and the metrics it reports, see Validating AWS DMS Tasks.

The data validation settings and their values include the following:

  • EnableValidation – Enables data validation when set to true. Otherwise, validation is disabled for the task. The default value is false.

  • FailureMaxCount – Specifies the maximum number of records that can fail validation before validation is suspended for the task. The default value is 10,000. If you want the validation to continue regardless of the number of records that fail validation, set this value higher than the number of records in the source.

  • HandleCollationDiff – When this option is set to true, the validation accounts for column collation differences in PostgreSQL endpoints when identifying source and target records to compare. Otherwise, any such differences in column collation are ignored for validation. In PostgreSQL endpoints, column collations can dictate the order of rows, which is important for data validation. Setting HandleCollationDiff to true resolves those collation differences automatically and prevents false positives in data validation. The default value is false.

  • RecordFailureDelayLimitInMinutes – Specifies the delay before reporting any validation failure details. Normally, AWS DMS uses the task latency to recognize actual delay for changes to make it to the target in order to prevent false positives. This setting overrides the actual delay value and enables you to set a higher delay before reporting any validation metrics. The default value is 0.

  • TableFailureMaxCount – Specifies the maximum number of tables that can fail validation before validation is suspended for the task. The default value is 1,000. If you want the validation to continue regardless of the number of tables that fail validation, set this value higher than the number of tables in the source.

  • ThreadCount – Specifies the number of execution threads that AWS DMS uses during validation. Each thread selects not-yet-validated data from the source and target to compare and validate. The default value is 5. If you set ThreadCount to a higher number, AWS DMS can complete the validation faster. However, AWS DMS then runs more simultaneous queries, consuming more resources on the source and the target.

  • ValidationOnly – When this option is set to true, running the task previews the data validation without performing any migration or replication of data.

    To be able to set this option, in the AWS DMS console set the task Migration type to Migrate existing data and replicate ongoing changes or Replicate data changes only. Alternatively, in the AWS DMS API set the migration type to full-load-and-cdc.

    By taking this approach, you can see the validation results and resolve any failures prior to actually moving the data. This option might be more efficient than waiting to resolve failures after all the source data has been migrated to the target. The default value is false.

For example, the following JSON enables data validation with twice the default number of threads. It also accounts for differences in record order caused by column collation differences in PostgreSQL endpoints. In addition, it provides a validation reporting delay to account for additional time to process any validation failures.

"ValidationSettings": { "EnableValidation": true, "ThreadCount": 10, "HandleCollationDiff": true, "RecordFailureDelayLimitInMinutes": 30 }

Note

For an Oracle endpoint, AWS DMS uses DBMS_CRYPTO to validate BLOBs. If your Oracle endpoint uses BLOBs, grant the execute permission for DBMS_CRYPTO to the user account that is used to access the Oracle endpoint. Do this by running the following statement.

grant execute on sys.dbms_crypto to <dms_endpoint_user>;