Tutorial: Loading data into Amazon Keyspaces using DSBulk - Amazon Keyspaces (for Apache Cassandra)

Tutorial: Loading data into Amazon Keyspaces using DSBulk

This step-by-step tutorial guides you through migrating data from Apache Cassandra to Amazon Keyspaces using the DataStax Bulk Loader (DSBulk) available on GitHub. Using DSBulk is useful to upload datasets to Amazon Keyspaces for academic or test purposes. For more information about how to migrate production workloads, see Offline migration process: Apache Cassandra to Amazon Keyspaces. In this tutorial, you complete the following steps.

Prerequisites – Set up an AWS account with credentials, create a JKS trust store file for the certificate, configure cqlsh, download and install DSBulk, and configure an application.conf file.

  1. Create source CSV and target table – Prepare a CSV file as the source data and create the target keyspace and table in Amazon Keyspaces.

  2. Prepare the data – Randomize the data in the CSV file and analyze it to determine the average and maximum row sizes.

  3. Set throughput capacity – Calculate the required write capacity units (WCUs) based on the data size and desired load time, and configure the table's provisioned capacity.

  4. Configure DSBulk settings – Create a DSBulk configuration file with settings like authentication, SSL/TLS, consistency level, and connection pool size.

  5. Run the DSBulk load command – Run the DSBulk load command to upload the data from the CSV file to the Amazon Keyspaces table, and monitor the progress.