Migrating to Amazon Keyspaces
Amazon Keyspaces (for Apache Cassandra) is a scalable, highly available, and managed Apache Cassandra–compatible database service. You can migrate your data to Amazon Keyspaces from Cassandra databases running on premises or on Amazon Elastic Compute Cloud (Amazon EC2) by using the steps in this section.
We recommend that you follow these best practices to ensure that your migration is successful:
-
Break the migration down into smaller components.
Consider the following units of migration and their potential footprint in terms of raw data size. Migrating smaller amounts of data in one or more phases may help simplify your migration.
By cluster – Migrate all of your Cassandra data at once. This approach may be fine for smaller clusters.
By keyspace or table – Break up your migration into groups of keyspaces or tables. This approach can help you migrate data in phases based on your requirements for each workload.
By data – Consider migrating data for a specific group of users or products, to bring the size of data down even more.
-
Prioritize what data to migrate first based on simplicity.
Consider if you have data that could be migrated first more easily—for example, data that does not change during specific times, data from nightly batch jobs, data not used during offline hours, or data from internal apps.
-
Use specific tooling.
Get started quickly with loading data into Amazon Keyspaces by using the cqlsh
COPY FROM
command. cqlsh is included with Apache Cassandra and is best suited for loading small datasets or test data. For step-by-step instructions, see Tutorial: Loading data into Amazon Keyspaces using cqlsh.For production workloads with large datasets, you can use the DataStax Bulk Loader for Apache Cassandra to load data into Amazon Keyspaces using the
dsbulk
command. DSBulk provides more robust import capabilities and is available from the GitHub repository. For step-by-step instructions, see Tutorial: Loading data into Amazon Keyspaces using DSBulk. To learn how to use the Apache Cassandra Spark connector to write data to Amazon Keyspaces, see Tutorial: Integrating Amazon Keyspaces with Apache Spark.
For complex migrations, consider using an extract, transform, and load (ETL) tool. You can use Amazon EMR to quickly and effectively perform data transformation workloads. For more information, see the Amazon EMR Management Guide.
Topics