Amazon S3 checksums with AWS SDK for Java - AWS SDK for Java 2.x

Amazon S3 checksums with AWS SDK for Java

Amazon Simple Storage Service (Amazon S3) provides the ability to specify a checksum when you upload an object. When you specify a checksum, it is stored with the object and can be validated when the object is downloaded.

Checksums provide an additional layer of data integrity when you transfer files. With checksums, you can verify data consistency by confirming that the received file matches the original file. For more information about checksums with Amazon S3, see the Amazon Simple Storage Service User Guide.

Amazon S3 currently supports four checksum algorithms: SHA-1, SHA-256, CRC-32, and CRC-32C. You have the flexibility to choose the algorithm that best fits your needs and let the SDK calculate the checksum. Alternatively, you can specify their own pre-computed checksum value by using one of the four supported algorithms.

We discuss checksums in two request phases: uploading an object and downloading an object.

Upload an object

You upload objects to Amazon S3 by using the putObject method of the S3Client. Use the checksumAlgorithm method of the builder for the PutObjectRequest to enable checksum computation and specify the algorithm. Valid values for the algorithm are CRC32, CRC32C, SHA1, and SHA256.

The following code snippet shows a request to upload an object with a CRC-32 checksum. When the SDK sends the request, it calculates the CRC-32 checksum and uploads the object. Amazon S3 stores the checksum with the object.

public void putObjectWithChecksum() { s3Client.putObject(b -> b .bucket(bucketName) .key(key) .checksumAlgorithm(ChecksumAlgorithm.CRC32), RequestBody.fromString("This is a test")); }

If the checksum that the SDK calculates doesn't match the checksum that Amazon S3 calculates when it receives the request, an error is returned.

Use a pre-calculated checksum value

A pre-calculated checksum value provided with the request disables automatic computation by the SDK and uses the provided value instead.

The following example shows a request with a pre-calculated SHA-256 checksum.

public void putObjectWithPrecalculatedChecksum(String filePath) { String checksum = calculateChecksum(filePath, "SHA-256"); s3Client.putObject((b -> b .bucket(bucketName) .key(key) .checksumSHA256(checksum)), RequestBody.fromFile(Paths.get(filePath))); }

If Amazon S3 determines the checksum value is incorrect for the specified algorithm, the service returns an error response.

Multipart uploads

You can also use checksums with multipart uploads. The SDK for Java 2.x provides two options to use checksums with multipart uploads. The first option uses the S3TransferManager.

The following transfer manager example specifies the SHA1 algorithm for the upload.

public void multipartUploadWithChecksumTm(String filePath) { S3TransferManager transferManager = S3TransferManager.create(); UploadFileRequest uploadFileRequest = UploadFileRequest.builder() .putObjectRequest(b -> b .bucket(bucketName) .key(key) .checksumAlgorithm(ChecksumAlgorithm.SHA1)) .source(Paths.get(filePath)) .build(); FileUpload fileUpload = transferManager.uploadFile(uploadFileRequest); fileUpload.completionFuture().join(); transferManager.close(); }

The second option uses the S3Client API (or the S3AsyncClient API) to perform the multipart upload. If you specify a checksum with this approach, you must specify the algorithm to use on the initiation of the upload. You must also specify the algorithm for each part request and provide the checksum calculated for each part after it is uploaded.

public void multipartUploadWithChecksumS3Client(String filePath) { ChecksumAlgorithm algorithm = ChecksumAlgorithm.CRC32; // Initiate the multipart upload. CreateMultipartUploadResponse createMultipartUploadResponse = s3Client.createMultipartUpload(b -> b .bucket(bucketName) .key(key) .checksumAlgorithm(algorithm)); // Checksum specified on initiation. String uploadId = createMultipartUploadResponse.uploadId(); // Upload the parts of the file. int partNumber = 1; List<CompletedPart> completedParts = new ArrayList<>(); ByteBuffer bb = ByteBuffer.allocate(1024 * 1024 * 5); // 5 MB byte buffer try (RandomAccessFile file = new RandomAccessFile(filePath, "r")) { long fileSize = file.length(); long position = 0; while (position < fileSize) { file.seek(position); long read = file.getChannel().read(bb); bb.flip(); // Swap position and limit before reading from the buffer. UploadPartRequest uploadPartRequest = UploadPartRequest.builder() .bucket(bucketName) .key(key) .uploadId(uploadId) .checksumAlgorithm(algorithm) // Checksum specified on each part. .partNumber(partNumber) .build(); UploadPartResponse partResponse = s3Client.uploadPart( uploadPartRequest, RequestBody.fromByteBuffer(bb)); CompletedPart part = CompletedPart.builder() .partNumber(partNumber) .checksumCRC32(partResponse.checksumCRC32()) // Provide the calculated checksum. .eTag(partResponse.eTag()) .build(); completedParts.add(part); bb.clear(); position += read; partNumber++; } } catch (IOException e) { System.err.println(e.getMessage()); } // Complete the multipart upload. s3Client.completeMultipartUpload(b -> b .bucket(bucketName) .key(key) .uploadId(uploadId) .multipartUpload(CompletedMultipartUpload.builder().parts(completedParts).build())); }

Code for the complete examples and tests are in the GitHub code examples repository.

Download an object

When you use the getObject method to download an object, the SDK automatically validates the checksum when the checksumMode method of the builder for the GetObjectRequest is set to ChecksumMode.ENABLED.

The request in the following snippet directs the SDK to validate the checksum in the response by calculating the checksum and comparing the values.

public GetObjectResponse getObjectWithChecksum() { return s3Client.getObject(b -> b .bucket(bucketName) .key(key) .checksumMode(ChecksumMode.ENABLED)) .response(); }

If the object wasn't uploaded with a checksum, no validation takes place.

An object in Amazon S3 can have multiple checksums, but only one checksum is validated on download. The following precedence— based on the efficiency of the checksum algorithm—determines which checksum the SDK validates:

  1. CRC-32C

  2. CRC-32

  3. SHA-1

  4. SHA-256

For example, if a response contains both CRC-32 and SHA-256 checksums, only the CRC-32 checksum is validated.