ElastiCache (Memcached) - Performance at Scale with Amazon ElastiCache

ElastiCache (Memcached)

The primary goal of caching is typically to offload reads from your database or other primary data source. In most apps, you have hot spots of data that are regularly queried, but only updated periodically. Think of the front page of a blog or news site, or the top 100 leaderboard in an online game. In this type of case, your app can receive dozens, hundreds, or even thousands of requests for the same data before it's updated again.

Having your caching layer handle these queries has several advantages. First, it's considerably cheaper to add an in-memory cache than to scale up to a larger database cluster. Second, an in-memory cache is also easier to scale out, because it's easier to distribute an in-memory cache horizontally than a relational database.

Last, a caching layer provides a request buffer in the event of a sudden spike in usage. If your app or game ends up on the front page of Reddit or the App Store, it's not unheard of to see a spike that is 10–100 times your normal application load. Even if you auto-scale your application instances, a 10x request spike will likely cause problems with your database.

Let's focus on ElastiCache (Memcached) first, because it is the best fit for a caching-focused solution. We'll revisit Redis later in the paper, and weigh its advantages and disadvantages.

Architecture with ElastiCache (Memcached)

When you deploy an ElastiCache Memcached cluster, it sits in your application as a separate tier alongside your database. As mentioned previously, Amazon ElastiCache does not directly communicate with your database tier, or indeed have any particular knowledge of your database. A simplified deployment for a web application looks similar to the following diagram.

A diagram depicting a simplified deployment for a web application.

A simplified deployment for a web application

In this architecture diagram, the Amazon EC2 application instances are in an Auto Scaling group, located behind a load balancer using ELB, which distributes requests among the instances. As requests come into a given EC2 instance, that EC2 instance is responsible for communicating with ElastiCache and the database tier. For development purposes, you can begin with a single ElastiCache node to test your application, and then scale to additional cluster nodes by modifying the ElastiCache cluster. As you add additional cache nodes, the EC2 application instances are able to distribute cache keys across multiple ElastiCache nodes. The most common practice is to use client-side sharding to distribute keys across cache nodes, discussed later in this paper.

A diagram depicting EC2 application instances in an Auto Scaling group .

EC2 application instances in an Auto Scaling group

When you launch an ElastiCache cluster, you can choose the Availability Zones where the cluster lives.

For best performance, configure your cluster to use the same Availability Zones as your application servers.

To launch an ElastiCache cluster in a specific Availability Zone, make sure to specify the Preferred Zone(s) option during cache cluster creation. The Availability Zones that you specify will be where ElastiCache will launch your cache nodes. AWS recommends that you select Spread Nodes Across Zones, which tells ElastiCache to distribute cache nodes across these zones as evenly as possible. This distribution will mitigate the impact of an Availability Zone disruption on your ElastiCache nodes. The trade-off is that some of the requests from your application to ElastiCache will go to a node in a different Availability Zone, meaning latency will be slightly higher. For more details, refer to Creating a cluster in the Amazon ElastiCache (Memcached) User Guide.

As mentioned earlier, ElastiCache can be coupled with a wide variety of databases. Here is an example architecture that uses Amazon DynamoDB instead of Amazon RDS and MySQL:

Example architecture using Amazon DynamoDB instead of Amazon RDS and MySQL.

Example architecture using Amazon DynamoDB instead of Amazon RDS and MySQL

This combination of DynamoDB and ElastiCache is very popular with mobile and game companies, because DynamoDB allows for higher write throughput at lower cost than traditional relational databases. In addition, DynamoDB uses a key-value access pattern similar to ElastiCache, which also simplifies the programming model. Instead of using relational SQL for the primary database but then key-value patterns for the cache, both the primary database and cache can be programmed similarly. In this architecture pattern, DynamoDB remains the source of truth for data, but application reads are offloaded to ElastiCache for a speed boost.

Selecting the right cache node size

ElastiCache supports a variety of cache node types. We recommend choosing a cache node from the M5 or R5 families, because the newest node types support the latest-generation CPUs and networking capabilities. These instance families can deliver up to 25 Gbps of aggregate network bandwidth with enhanced networking based on the Elastic Network Adapter (ENA) and over 600 GiB of memory. The R5 node types provide 5% more memory per vCPU and a 10% price per GiB improvement over R4 node types. In addition, R5 node types deliver a ~20% CPU performance improvement over R4 node types.

If you don’t know how much capacity you need, AWS recommends starting with one cache.m5.large node. Use the ElastiCache metrics published to CloudWatch to monitor memory usage, CPU utilization, and the cache hit rate. If your cluster does not have the desired hit rate, or you notice that keys are being evicted too often, choose another node type with more CPU and memory capacity. For production and large workloads, the R5 nodes typically provide the best performance and memory cost value.

You can get an approximate estimate of the amount of cache memory you'll need by multiplying the size of items you want to cache by the number of items you want to keep cached at once. Unfortunately, calculating the size of your cached items can be trickier than it sounds. You can arrive at a slight overestimate by serializing your cached items and then counting characters. Here's an example that flattens a Ruby object to JSON, counts the number of characters, and then multiplies by 2 because there are typically 2 bytes per character:

irb(main):010:0> user = User.find(4) irb(main):011:0> use/to_json.size * 2 => 580

In addition to the size of your data, Memcached adds approximately 50–60 bytes of internal bookkeeping data to each element. The cache key also consumes space, up to 250 characters at two bytes each. In this example, it's probably safest to overestimate a little and guess 1–2 KB per cached object. Keep in mind that this approach is just for illustration purposes. Your cached objects can be much larger if you are caching rendered page fragments or if you use a serialization library that expands strings.

Because Amazon ElastiCache is a pay-as-you-go service, make your best guess at the node instance size, and then adjust after getting some real-world data. Make sure that your application is set up for consistent hashing, which will enable you to add additional Memcached nodes to scale your in-memory layer horizontally. For additional tips, refer to Choosing your node size in the Amazon ElastiCache for Memcached User Guide.

Security groups and VPC

Like other AWS services, ElastiCache supports security groups. You can use security groups to define rules that limit access to your instances based on IP address and port. ElastiCache supports both subnet security groups in Amazon Virtual Private Cloud (Amazon VPC) and classic Amazon EC2 security groups. We strongly recommend that you deploy ElastiCache and your application in Amazon VPC, unless you have a specific need otherwise (such as for an existing application). Amazon VPC offers several advantages, including fine-grained access rules and control over private IP addressing. For an overview of how ElastiCache integrates with Amazon VPC, see Understanding ElastiCache and Amazon VPCs in the Amazon ElastiCache (Memcached) User Guide.

When launching your ElastiCache cluster in a VPC, launch it in a private subnet with no public connectivity for best security. Memcached does not have any serious authentication or encryption capabilities, but Redis does support encryption. Following is a simplified version of the previous architecture diagram that includes an example VPC subnet design.

A diagram showing an example VPC subnet design.

Example VPC subnet design

To keep your cache nodes as secure as possible, only allow access to your cache cluster from your application tier, as shown preceding. ElastiCache does not need connectivity to or from your database tier, because your database does not directly interact with ElastiCache. Only application instances that are making calls to your cache cluster need connectivity to it.

The way ElastiCache manages connectivity in Amazon VPC is through standard VPC subnets and security groups. To securely launch an ElastiCache cluster in Amazon VPC, follow these steps:

  1. Create VPC private subnet(s) that will house your ElastiCache cluster, in the same VPC as the rest of your application. A given VPC subnet maps to a single Availability Zone. Given this mapping, create a private VPC subnet for each Availability Zone where you have application instances. Alternatively, you can reuse another private VPC subnet that you already have. For more information, refer to VPCs and subnets in the Amazon Virtual Private Cloud User Guide.

  2. Create a VPC security group for your new cache cluster. Make sure it is also in the same VPC as the preceding subnet. For more details, refer to Amazon VPCs and ElastiCache security.

  3. Create a single access rule for this security group, allowing inbound access on port 11211 for Memcached or on port 6379 for Redis.

  4. Create an ElastiCache subnet group that contains the VPC private subnets that you created in step 1. This subnet group is how ElastiCache knows which VPC subnets to use when launching the cluster. For instructions, refer to Creating a subnet group in the Amazon ElastiCache for Memcached User Guide.

  5. When you launch your ElastiCache cluster, make sure to place it in the correct VPC, and choose the correct ElastiCache subnet group. For instructions, see Creating a cluster in the Amazon ElastiCache for Memcached User Guide.

A correct VPC security group for your cache cluster should look like the following. Notice the single inbound rule allowing access to the cluster from the application tier:

A screenshot of the VPC dashboard showing a VPC security group for your cache cluster.

VPC security group for your cache cluster

To test connectivity from an application instance to your cache cluster in VPC, you can use Netcat, a Linux command-line utility. Choose one of your cache cluster nodes, and attempt to connect to the node on either port 11211 (Memcached) or port 6379 (Redis):

$ nc -z -w5 my-cache-2b.z2vq55.001.usw2.cache. amazonaws.com 11211 $ echo $? 0

If the connection is successful, Netcat will exit with status 0. If Netcat appears to hang, or exits with a nonzero status, check your VPC security group and subnet settings.