Amazon Simple Storage Service (Amazon S3) provides the ability to specify a checksum when you upload an object. When you specify a checksum, it is stored with the object and can be validated when the object is downloaded.
Checksums provide an additional layer of data integrity when you transfer files. With checksums, you can verify data consistency by confirming that the received file matches the original file. For more information about checksums with Amazon S3, see the Amazon Simple Storage Service User Guide including the supported algorithms.
You have the flexibility to choose the algorithm that best fits your needs and let the SDK calculate the checksum. Alternatively, you can provide a pre-computed checksum value by using one of the supported algorithms.
Note
Beginning with version 1.4.0 of the AWS SDK for Kotlin, the SDK provides
default integrity protections by automatically calculating a CRC32
checksum
for uploads. The SDK calculates this checksum if you don't provide a precalculated
checksum value or if you don't specify an algorithm that the SDK should use to calculate
a checksum.
The SDK also provides global settings for data integrity protections that you can set externally, which you can read about in the AWS SDKs and Tools Reference Guide.
We discuss checksums in two request phases: uploading an object and downloading an object.
Upload an object
You upload
objects to Amazon S3 with the SDK for Kotlin by using the putObject
checksumAlgorithm
property to enable checksum computation.
The following code snippet shows a request to upload an object with
a CRC32
checksum. When the SDK sends the request, it calculates the
CRC32
checksum and uploads the object. Amazon S3 stores the checksum with
the object.
val request = PutObjectRequest {
bucket = "amzn-s3-demo-bucket
"
key = "key
"
checksumAlgorithm = ChecksumAlgorithm.CRC32
}
If you don't provide a checksum algorithm with the request, the checksum behavior varies depending on the version of the SDK that you use as shown in the following table.
Checksum behavior when no checksum algorithm is provided
Kotlin SDK version | Checksum behavior |
---|---|
earlier than 1.4.0 | The SDK doesn't automatically calculate a CRC-based checksum and provide it in the request. |
1.4.0 or later |
The SDK uses the |
Use a pre-calculated checksum value
A pre-calculated checksum value provided with the request disables automatic computation by the SDK and uses the provided value instead.
The following example shows a request with a pre-calculated SHA256 checksum.
val request = PutObjectRequest {
bucket = "amzn-s3-demo-bucket
"
key = "key
"
body = ByteStream.fromFile(File("file_to_upload.txt"))
checksumAlgorithm = ChecksumAlgorithm.SHA256
checksumSha256 = "cfb6d06da6e6f51c22ae3e549e33959dbb754db75a93665b8b579605464ce299"
}
If Amazon S3 determines the checksum value is incorrect for the specified algorithm, the service returns an error response.
Multipart uploads
You can also use checksums with multipart uploads.
You must specify the checksum algorithm in the
CreateMultipartUpload
request and in each
UploadPart
request. As a final step, you must specify the
checksum of each part in the CompleteMultipartUpload
. The following
example shows how to create a multipart upload with the checksum algorithm
specified.
val multipartUpload = s3.createMultipartUpload {
bucket = "amzn-s3-demo-bucket
"
key = "key
"
checksumAlgorithm = ChecksumAlgorithm.Sha1
}
val partFilesToUpload = listOf("data-part1.csv", "data-part2.csv", "data-part3.csv")
val completedParts = partFilesToUpload
.mapIndexed { i, fileName ->
val uploadPartResponse = s3.uploadPart {
bucket = "amzn-s3-demo-bucket"
key = "key"
body = ByteStream.fromFile(File(fileName))
uploadId = multipartUpload.uploadId
partNumber = i + 1 // Part numbers begin at 1.
checksumAlgorithm = ChecksumAlgorithm.Sha1
}
CompletedPart {
eTag = uploadPartResponse.eTag
partNumber = i + 1
checksumSha1 = uploadPartResponse.checksumSha1
}
}
s3.completeMultipartUpload {
uploadId = multipartUpload.uploadId
bucket = "amzn-s3-demo-bucket
"
key = "key
"
multipartUpload {
parts = completedParts
}
}
Download an object
When you use the getObjectchecksumMode
property of the builder for
the GetObjectRequest
is set to
ChecksumMode.Enabled
.
The request in the following snippet directs the SDK to validate the checksum in the response by calculating the checksum and comparing the values.
val request = GetObjectRequest {
bucket = "amzn-s3-demo-bucket
"
key = "key
"
checksumMode = ChecksumMode.Enabled
}
Note
If the object wasn't uploaded with a checksum, no validation takes place.
If you use an SDK version of 1.4.0 or later,
the SDK automatically checks the integrity of getObject
requests without
adding checksumMode = ChecksumMode.Enabled
to the request.
Asynchronous validation
Because the SDK for Kotlin uses streaming responses when it downloads an object from Amazon S3, the checksum will be calculated as you consume the object. Therefore, you must consume the object so that the checksum is validated.
The following example shows how to validate a checksum by fully consuming the response.
val request = GetObjectRequest {
bucket = "amzn-s3-demo-bucket
"
key = "key
"
checksumMode = checksumMode.Enabled
}
val response = s3Client.getObject(request) {
println(response.body?.decodeToString()) // Fully consume the object.
// The checksum is valid.
}
By contrast, the code in the following example doesn't use the object in any way, so the checksum is not validated.
s3Client.getObject(request) {
println("Got the object.")
}
If the checksum calculated by the SDK does not match the expected checksum
sent with the response, the SDK throws a
ChecksumMismatchException
.