Data integrity protection with checksums
Amazon Simple Storage Service (Amazon S3) provides the ability to specify a checksum when you upload an object. When you specify a checksum, it is stored with the object and can be validated when the object is downloaded.
Checksums provide an additional layer of data integrity when you transfer files. With checksums, you can verify data consistency by confirming that the received file matches the original file. For more information about checksums with Amazon S3, see the Amazon Simple Storage Service User Guide including the supported algorithms.
You have the flexibility to choose the algorithm that best fits your needs and let the SDK calculate the checksum. Alternatively, you can provide a pre-computed checksum value by using one of the supported algorithms.
Note
Beginning with version 2.30.0 of the
AWS SDK for Java 2.x, the SDK provides default integrity protections by automatically
calculating a CRC32
checksum for uploads. The SDK calculates this checksum
if you don't provide a precalculated checksum value or if you don't specify an algorithm
that the SDK should use to calculate a checksum.
The SDK also provides global settings for data integrity protections that you can set externally, which you can read about in the AWS SDKs and Tools Reference Guide.
We discuss checksums in two request phases: uploading an object and downloading an object.
Upload an object
When you upload an object with the
putObject
method and provide a checksum algorithm, the SDK computes the
checksum for the specified algorithm.
The following code snippet shows a request to upload an object with a
SHA256
checksum. When the SDK sends the request, it calculates the
SHA256
checksum and uploads the object. Amazon S3 validates the
integrity of the content by calculating the checksum and comparing it to the
checksum provided by the SDK. Amazon S3 then stores the checksum with the object.
public void putObjectWithChecksum() { s3Client.putObject(b -> b .bucket(bucketName) .key(key) .checksumAlgorithm(ChecksumAlgorithm.SHA256), RequestBody.fromString("This is a test")); }
If you don't provide a checksum algorithm with the request, the checksum behavior varies depending on the version of the SDK that you use as shown in the following table.
Checksum behavior when no checksum algorithm is provided
Java SDK version | Checksum behavior |
---|---|
earlier than 2.30.0 | The SDK doesn't automatically calculate a CRC-based checksum and provide it in the request. |
2.30.0 or later | The SDK uses the |
Use a pre-calculated checksum value
A pre-calculated checksum value provided with the request disables automatic computation by the SDK and uses the provided value instead.
The following example shows a request with a pre-calculated SHA256 checksum.
public void putObjectWithPrecalculatedChecksum(String filePath) { String checksum = calculateChecksum(filePath, "SHA-256"); s3Client.putObject((b -> b .bucket(bucketName) .key(key) .checksumSHA256(checksum)), RequestBody.fromFile(Paths.get(filePath))); }
If Amazon S3 determines the checksum value is incorrect for the specified algorithm, the service returns an error response.
Multipart uploads
You can also use checksums with multipart uploads.
The SDK for Java 2.x provides two options to
use checksums with multipart uploads. The first option uses the
S3TransferManager
.
The following transfer manager example specifies the SHA1 algorithm for the upload.
public void multipartUploadWithChecksumTm(String filePath) { S3TransferManager transferManager = S3TransferManager.create(); UploadFileRequest uploadFileRequest = UploadFileRequest.builder() .putObjectRequest(b -> b .bucket(bucketName) .key(key) .checksumAlgorithm(ChecksumAlgorithm.SHA1)) .source(Paths.get(filePath)) .build(); FileUpload fileUpload = transferManager.uploadFile(uploadFileRequest); fileUpload.completionFuture().join(); transferManager.close(); }
If you don't provide a checksum algorithm when using the transfer manager for uploads, the
SDK automatically calculates and checksum based on the CRC32
algorithm. The SDK
performs this calculation for all versions of the SDK.
The second option uses the S3Client
APIS3AsyncClient
API
public void multipartUploadWithChecksumS3Client(String filePath) { ChecksumAlgorithm algorithm = ChecksumAlgorithm.CRC32; // Initiate the multipart upload. CreateMultipartUploadResponse createMultipartUploadResponse = s3Client.createMultipartUpload(b -> b .bucket(bucketName) .key(key) .checksumAlgorithm(algorithm)); // Checksum specified on initiation. String uploadId = createMultipartUploadResponse.uploadId(); // Upload the parts of the file. int partNumber = 1; List<CompletedPart> completedParts = new ArrayList<>(); ByteBuffer bb = ByteBuffer.allocate(1024 * 1024 * 5); // 5 MB byte buffer try (RandomAccessFile file = new RandomAccessFile(filePath, "r")) { long fileSize = file.length(); long position = 0; while (position < fileSize) { file.seek(position); long read = file.getChannel().read(bb); bb.flip(); // Swap position and limit before reading from the buffer. UploadPartRequest uploadPartRequest = UploadPartRequest.builder() .bucket(bucketName) .key(key) .uploadId(uploadId) .checksumAlgorithm(algorithm) // Checksum specified on each part. .partNumber(partNumber) .build(); UploadPartResponse partResponse = s3Client.uploadPart( uploadPartRequest, RequestBody.fromByteBuffer(bb)); CompletedPart part = CompletedPart.builder() .partNumber(partNumber) .checksumCRC32(partResponse.checksumCRC32()) // Provide the calculated checksum. .eTag(partResponse.eTag()) .build(); completedParts.add(part); bb.clear(); position += read; partNumber++; } } catch (IOException e) { System.err.println(e.getMessage()); } // Complete the multipart upload. s3Client.completeMultipartUpload(b -> b .bucket(bucketName) .key(key) .uploadId(uploadId) .multipartUpload(CompletedMultipartUpload.builder().parts(completedParts).build())); }
Code for the complete examples
Download an object
When you use the getObjectchecksumMode
method of the builder for the
GetObjectRequest
is set to ChecksumMode.ENABLED
.
The request in the following snippet directs the SDK to validate the checksum in the response by calculating the checksum and comparing the values.
public GetObjectResponse getObjectWithChecksum() { return s3Client.getObject(b -> b .bucket(bucketName) .key(key) .checksumMode(ChecksumMode.ENABLED)) .response(); }
If the object wasn't uploaded with a checksum, no validation takes place.