Working with queries in Amazon Keyspaces - Amazon Keyspaces (for Apache Cassandra)

Working with queries in Amazon Keyspaces

This section gives an introduction into working with queries in Amazon Keyspaces (for Apache Cassandra). The CQL statements available to query, transform, and manage data are SELECT, INSERT, UPDATE, and DELETE. The following topics outline some of the more complex options available when working with queries. For the complete language syntax with examples, see DML statements (data manipulation language) in Amazon Keyspaces.

Using the IN operator with the SELECT Statement in Amazon Keyspaces

SELECT IN

You can query data from tables using the SELECT statement, which reads one or more columns for one or more rows in a table and returns a result-set containing the rows matching the request. A SELECT statement contains a select_clause that determines which columns to read and to return in the result-set. The clause can contain instructions to transform the data before returning it. The optional WHERE clause specifies which rows must be queried and is composed of relations on the columns that are part of the primary key. Amazon Keyspaces supports the IN keyword in the WHERE clause. This section uses examples to show how Amazon Keyspaces processes SELECT statements with the IN keyword.

This examples demonstrates how Amazon Keyspaces breaks down the SELECT statement with the IN keyword into subqueries. In this example we use a table with the name my_keyspace.customers. The table has one primary key column department_id, two clustering columns sales_region_id and sales_representative_id, and one column that contains the name of the customer in the customer_name column.

SELECT * FROM my_keyspace.customers; department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 0 | 0 | 0 | a 0 | 0 | 1 | b 0 | 1 | 0 | c 0 | 1 | 1 | d 1 | 0 | 0 | e 1 | 0 | 1 | f 1 | 1 | 0 | g 1 | 1 | 1 | h

Using this table, you can run the following SELECT statement to find the customers in the departments and sales regions that you are interested in with the IN keyword in the WHERE clause. The following statement is an example of this.

SELECT * FROM my_keyspace.customers WHERE department_id IN (0, 1) AND sales_region_id IN (0, 1);

Amazon Keyspaces divides this statement into four subqueries as shown in the following output.

SELECT * FROM my_keyspace.customers WHERE department_id = 0 AND sales_region_id = 0; department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 0 | 0 | 0 | a 0 | 0 | 1 | b SELECT * FROM my_keyspace.customers WHERE department_id = 0 AND sales_region_id = 1; department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 0 | 1 | 0 | c 0 | 1 | 1 | d SELECT * FROM my_keyspace.customers WHERE department_id = 1 AND sales_region_id = 0; department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 1 | 0 | 0 | e 1 | 0 | 1 | f SELECT * FROM my_keyspace.customers WHERE department_id = 1 AND sales_region_id = 1; department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 1 | 1 | 0 | g 1 | 1 | 1 | h

When the IN keyword is used, Amazon Keyspaces automatically paginates the results in any of the following cases:

  • After every 10th subquery is processed.

  • After processing 1MB of logical IO.

  • If you configured a PAGE SIZE, Amazon Keyspaces paginates after reading the number of queries for processing based on the set PAGE SIZE.

  • When you use the LIMIT keyword to reduce the number of rows returned, Amazon Keyspaces paginates after reading the number of queries for processing based on the set LIMIT.

The following table is used to illustrate this with an example.

For more information about pagination, see Paginating results in Amazon Keyspaces.

SELECT * FROM my_keyspace.customers; department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 2 | 0 | 0 | g 2 | 1 | 1 | h 2 | 2 | 2 | i 0 | 0 | 0 | a 0 | 1 | 1 | b 0 | 2 | 2 | c 1 | 0 | 0 | d 1 | 1 | 1 | e 1 | 2 | 2 | f 3 | 0 | 0 | j 3 | 1 | 1 | k 3 | 2 | 2 | l

You can run the following statement on this table to see how pagination works.

SELECT * FROM my_keyspace.customers WHERE department_id IN (0, 1, 2, 3) AND sales_region_id IN (0, 1, 2) AND sales_representative_id IN (0, 1);

Amazon Keyspaces processes this statement as 24 subqueries, because the cardinality of the Cartesian product of all the IN terms contained in this query is 24.

department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 0 | 0 | 0 | a 0 | 1 | 1 | b 1 | 0 | 0 | d 1 | 1 | 1 | e ---MORE--- department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 2 | 0 | 0 | g 2 | 1 | 1 | h 3 | 0 | 0 | j ---MORE--- department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 3 | 1 | 1 | k

This example shows how you can use the ORDER BY clause in a SELECT statement with the IN keyword.

SELECT * FROM my_keyspace.customers WHERE department_id IN (3, 2, 1) ORDER BY sales_region_id DESC; department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 3 | 2 | 2 | l 3 | 1 | 1 | k 3 | 0 | 0 | j 2 | 2 | 2 | i 2 | 1 | 1 | h 2 | 0 | 0 | g 1 | 2 | 2 | f 1 | 1 | 1 | e 1 | 0 | 0 | d

Subqueries are processed in the order in which the partition key and clustering key columns are presented in the query. In the example below, subqueries for partition key value ”2“ are processed first, followed by subqueries for partition key value ”3“ and ”1“. Results of a given subquery are ordered according to the query's ordering clause, if present, or the table's clustering order defined during table creation.

SELECT * FROM my_keyspace.customers WHERE department_id IN (2, 3, 1) ORDER BY sales_region_id DESC; department_id | sales_region_id | sales_representative_id | customer_name ---------------+-----------------+-------------------------+-------------- 2 | 2 | 2 | i 2 | 1 | 1 | h 2 | 0 | 0 | g 3 | 2 | 2 | l 3 | 1 | 1 | k 3 | 0 | 0 | j 1 | 2 | 2 | f 1 | 1 | 1 | e 1 | 0 | 0 | d

Ordering results in Amazon Keyspaces

The ORDER BY clause specifies the sort order of the results returned in a SELECT statement. The statement takes a list of column names as arguments and for each column you can specify the sort order for the data. You can only specify clustering columns in ordering clauses, non-clustering columns are not allowed.

The two available sort order options for the returned results are ASC for ascending and DESC for descending sort order.

SELECT * FROM my_keyspace.my_table ORDER BY (col1 ASC, col2 DESC, col3 ASC); col1 | col2 | col3 ------+------+------ 0 | 6 | a 1 | 5 | b 2 | 4 | c 3 | 3 | d 4 | 2 | e 5 | 1 | f 6 | 0 | g
SELECT * FROM my_keyspace.my_table ORDER BY (col1 DESC, col2 ASC, col3 DESC); col1 | col2 | col3 ------+------+------ 6 | 0 | g 5 | 1 | f 4 | 2 | e 3 | 3 | d 2 | 4 | c 1 | 5 | b 0 | 6 | a

If you don't specify the sort order in the query statement, the default ordering of the clustering column is used.

The possible sort orders you can use in an ordering clause depend on the sort order assigned to each clustering column at table creation. Query results can only be sorted in the order defined for all clustering columns at table creation or the inverse of the defined sort order. Other possible combinations are not allowed.

For example, if the table's CLUSTERING ORDER is (col1 ASC, col2 DESC, col3 ASC), then the valid parameters for ORDER BY are either (col1 ASC, col2 DESC, col3 ASC) or (col1 DESC, col2 ASC, col3 DESC). For more information on CLUSTERING ORDER, see table_options under CREATE TABLE.

Paginating results in Amazon Keyspaces

Amazon Keyspaces automatically paginates the results from SELECT statements when the data read to process the SELECT statement exceeds 1 MB. With pagination, the SELECT statement results are divided into "pages" of data that are 1 MB in size (or less). An application can process the first page of results, then the second page, and so on. Clients should always check for pagination tokens when processing SELECT queries that return multiple rows.

If a client supplies a PAGE SIZE that requires reading more than 1 MB of data, Amazon Keyspaces breaks up the results automatically into multiple pages based on the 1 MB data-read increments.

For example, if the average size of a row is 100 KB and you specify a PAGE SIZE of 20, Amazon Keyspaces paginates data automatically after it reads 10 rows (1000 KB of data read).

Because Amazon Keyspaces paginates results based on the number of rows that it reads to process a request and not the number of rows returned in the result set, some pages may not contain any rows if you are running filtered queries.

For example, if you set PAGE SIZE to 10 and Keyspaces evaluates 30 rows to process your SELECT query, Amazon Keyspaces will return three pages. If only a subset of the rows matched your query, some pages may have less than 10 rows. For an example how the PAGE SIZE of LIMIT queries can affect read capacity, see Limit queries.