Integrating Amazon Keyspaces with Apache Spark

Apache Spark is an open-source engine for large-scale data analytics. Apache Spark enables you to perform analytics on data stored in Amazon Keyspaces more efficiently. You can also use Amazon Keyspaces to provide applications with consistent, single-digit-millisecond read access to analytics data from Spark. The open-source Spark Cassandra Connector simplifies reading and writing data between Amazon Keyspaces and Spark.

Amazon Keyspaces support for the Spark Cassandra Connector streamlines running Cassandra workloads in Spark-based analytics pipelines by using a fully managed and serverless database service. With Amazon Keyspaces, you don’t need to worry about Spark competing for the same underlying infrastructure resources as your tables. Amazon Keyspaces tables scale up and down automatically based on your application traffic.

The following tutorial walks you through steps and best practices required to read and write data to Amazon Keyspaces using the Spark Cassandra Connector. The tutorial demonstrates how to migrate data to Amazon Keyspaces by loading data from a file with the Spark Cassandra Connector and writing it to an Amazon Keyspaces table. Then, the tutorial shows how to read the data back from Amazon Keyspaces using the Spark Cassandra Connector. You would do this to run Cassandra workloads in Spark-based analytics pipelines.

Topics

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Libraries and tools

Prerequisites