Managing data consistency - Amazon Redshift

Managing data consistency

Amazon S3 provides eventual consistency for some operations, so it is possible that new data will not be available immediately after the upload, which could result in an incomplete data load or loading stale data. COPY operations where the cluster and the bucket are in different regions are eventually consistent. All regions provide read-after-write consistency for uploads of new objects with unique object keys. For more information about data consistency, see Amazon S3 Data Consistency Model in the Amazon Simple Storage Service Developer Guide.

To ensure that your application loads the correct data, we recommend the following practices:

  • Create new object keys.

    Amazon S3 provides eventual consistency in all regions for overwrite operations. Creating new file names, or object keys, in Amazon S3 for each data load operation provides strong consistency in all regions.

  • Use a manifest file with your COPY operation.

    The manifest explicitly names the files to be loaded. Using a manifest file enforces strong consistency.

The rest of this section explains these steps in detail.

Creating new object keys

Because of potential data consistency issues, we strongly recommend creating new files with unique Amazon S3 object keys for each data load operation. If you overwrite existing files with new data, and then issue a COPY command immediately following the upload, it is possible for the COPY operation to begin loading from the old files before all of the new data is available. For more information about eventual consistency, see Amazon S3 Data Consistency Model in the Amazon S3 Developer Guide.

Using a manifest file

You can explicitly specify which files to load by using a manifest file. When you use a manifest file, COPY enforces strong consistency by searching secondary servers if it does not find a listed file on the primary server. The manifest file can be configured with an optional mandatory flag. If mandatory is true and the file is not found, COPY returns an error.

For more information about using a manifest file, see the copy_from_s3_manifest_file option for the COPY command and Example: COPY from Amazon S3 using a manifest in the COPY examples.

Because Amazon S3 provides eventual consistency for overwrites in all regions, it is possible to load stale data if you overwrite existing objects with new data. As a best practice, never overwrite existing files with new data.