|« PreviousNext »|
|Did this page help you? Yes | No | Tell us about it...|
You can use the COPY command to load a table with data from a single Amazon DynamoDB table.
The COPY command leverages the Amazon Redshift massively parallel processing (MPP) architecture to read and load data in parallel from Amazon DynamoDB tables. You can take maximum advantage of parallelism by setting distribution keys on your Amazon Redshift tables. For more information, see Choosing a data distribution method.
When the COPY command reads data from the Amazon DynamoDB table, the resulting data transfer is part of that table's provisioned throughput.
To avoid consuming excessive amounts of provisioned read throughput, we recommend that you not load data from Amazon DynamoDB tables that are in production environments. If you do load data from production tables, we recommend that you set the READRATIO option much lower than the average percentage of unused provisioned throughput. A low READRATIO setting will help minimize throttling issues. To use the entire provisioned throughput of an Amazon DynamoDB table, set READRATIO to 100.
The COPY command matches attribute names in the items retrieved from the Amazon DynamoDB table to column names in an existing Amazon Redshift table by using the following rules:
Amazon Redshift table columns are case-insensitively matched to Amazon DynamoDB item attributes. If an item in the DynamoDB table contains multiple attributes that differ only in case, such as Price and PRICE, the COPY command will fail.
Amazon Redshift table columns that do not match an attribute in the Amazon DynamoDB table are loaded as either NULL or empty, depending on the value specified with the EMPTYASNULL option in the COPY command.
Amazon DynamoDB attributes that do not match a column in the Amazon Redshift table are discarded. Attributes are read before they are matched, and so even discarded attributes consume part of that table's provisioned throughput.
Only Amazon DynamoDB attributes with scalar STRING and NUMBER data types are supported. The Amazon DynamoDB BINARY and SET data types are not supported. If a COPY command tries to load an attribute with an unsupported data type, the command will fail. If the attribute does not match an Amazon Redshift table column, COPY does not attempt to load it, and it does not raise an error.
The COPY command uses the following syntax to load data from an Amazon DynamoDB table:
copy <redshift_tablename> from 'dynamodb://<dynamodb_table_name>' credentials 'aws_access_key_id=<your-access-key-id>;aws_secret_access_key=<your-secret-access-key>' readratio '<integer>' [options ];
The values for <your-access-key-id> and <your-secret-access-key> are the AWS credentials needed to access the Amazon DynamoDB table. If these credentials correspond to an IAM user, that IAM user must have permission to SCAN and DESCRIBE the Amazon DynamoDB table that is being loaded.
You can limit the access users have to your data by using temporary security credentials. Temporary security credentials provide enhanced security because they have short life spans and cannot be reused after they expire. A user who has these temporary security credentials can access your resources only until the credentials expire. For more information, see Temporary Security Credentials. To load data using temporary access credentials, use the following syntax:
copy <redshift_tablename> from 'dynamodb://<dynamodb_table_name>' credentials 'aws_access_key_id=<temporary-access-key-id>;aws_secret_access_key=<temporary-secret-access-key>;token=<temporary-token>' [<options>];
The temporary security credentials must be valid for the entire duration of the COPY statement. If the temporary security credentials expire during the load process, the COPY will fail and the transaction will be rolled back. For example, if temporary security credentials expire after 15 minutes and the COPY requires one hour, the COPY will fail before it completes.
If you want to validate your data without actually loading the table, use the NOLOAD option with the COPY command.
The following example loads the FAVORITEMOVIES table with data from the Amazon DynamoDB table my-favorite-movies-table. The read activity can consume up to 50% of the provisioned throughput.
copy favoritemovies from 'dynamodb://my-favorite-movies-table' credentials 'aws_access_key_id=<your-access-key-id>;aws_secret_access_key=<your-secret-access-key>' readratio 50;
To maximize throughput, the COPY command loads data from an Amazon DynamoDB table in parallel across the compute nodes in the cluster.
By default, the COPY command applies automatic compression whenever you specify an empty target table with no compression encoding. The automatic compression analysis initially samples a large number of rows from the Amazon DynamoDB table. The sample size is based on the value of the COMPROWS parameter. The default is 100,000 rows per slice.
After sampling, the sample rows are discarded and the entire table is loaded. As a result, many rows are read twice. For more information about how automatic compression works, see Loading tables with automatic compression.
When the COPY command reads data from the Amazon DynamoDB table, including the rows used for sampling, the resulting data transfer is part of that table's provisioned throughput.
If your data includes non-ASCII multibyte characters (such as Chinese or Cyrillic characters), you must load the data to VARCHAR columns. The VARCHAR data type supports three-byte UTF-8 characters, but the CHAR data type only accepts single-byte ASCII characters. You cannot load four-byte or longer characters into Amazon Redshift tables. For more information about CHAR and VARCHAR, see Data types.