File Transfer checksums - Nimble Studio File Transfer

File Transfer checksums

File Transfer performs checksums in the background for your uploads to verify the integrity of the files on disk against the files in the S3 bucket. Checksums are calculated for each file you upload, and the checksum values are stored in the File Transfer database.

The following explains File Transfer's native checksum process:

  1. Checksums are calculated for files that you upload.

  2. If the upload file doesn't exist in the S3 bucket, then the checksum is added to the File Transfer database, and the file is uploaded to the Amazon S3 bucket.

  3. If the upload file already exists in the S3 bucket, then the upload file's checksum is checked against the checksum in the File Transfer database.

    1. If the checksums match, then the file is not uploaded because it is identical to the file in the S3 bucket.

    2. If the checksums don't match, the upload file has been modified and it is uploaded to the S3 bucket. The new checksum is added to the File Transfer database.

If you want to skip the native checksum process within File Transfer, add a Media Hash List (MHL) to the same folder, or any parent folder, of the file that you want to upload. If you provide your own MHLs, File Transfer verifies file hashes against the MHL. A single MHL in the root of your local File Transfer folder can recursively reference files within sub-folders. We recommend that you have a single MHL file, that has checksums for most, if not all, of the files in the folder, rather than an MHL file for every file.

The following are some important concepts to understand about File Transfer checksums.

Native checksums

Checksums are calculated for files that you upload. The checksums are checked against the checksums in the File Transfer database. If there is a mismatch in checksums, File Transfer uploads the file again. A mismatch in checksums occurs if you have changed the file since the original upload. The first time the file is uploaded, there will be no existing file in Amazon S3 that File Transfer can use to compare against. The CPU count impacts checksum performance.

MHL checksums

If you want to skip the native checksum process of File Transfer, supply a Media Hash List (MHL) file in the upload directory. The MHL file is used to verify the integrity of the files as they move to different places.

File Transfer treats the MHL as the authoritative source and appends the checksum value to the uploaded object's metadata. The MHL file must contain one of the following fields: <md5>HEXVALUE</md5>, <xxhash64>HEXVALUE</xxhash64>, or <xxhash64be>HEXVALUE</xxhash64be>. To learn more about MHL specification, see About Media Hash List.

Configurable checksums

By default, File Transfer uses one less than the total logical core count to concurrently compute checksums. This value is the maximum threshold.

For example, if your host machine has 12 logical cores, then the maximum threshold is 11. The minimum threshold will always be 1, regardless of the number of cores in the machine. By default, 1 checksum runs at a time. There is a safeguard in place to ensure that the number of max active checksums doesn’t surpass your maximum threshold.

You can adjust the number of checksums running at the same time by modifying the max_active_checksums configuration property. An example of when you might want to adjust the number of checksums is if you wanted to reduce the amount of resources that File Transfer uses. This frees CPU resources for other processes.