Menu
MongoDB on AWS
Quick Start Reference Deployment Guide

Architecture

AWS CloudFormation provides an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion.

Deploying this Quick Start for a new VPC with default parameters builds the following MongoDB environment in the AWS Cloud.


      Figure 1: Quick Start architecture for MongoDB on AWS

Figure 1: Quick Start architecture for MongoDB on AWS

  

The following AWS components are deployed and configured as part of this reference deployment:

  • A VPC configured with public and private subnets across three Availability Zones.*

  • In the public subnets, NAT gateways to allow outbound Internet connectivity for resources (MongoDB instances) in the private subnets. (For more information, see the Amazon VPC Quick Start.)*

  • In the public subnets, bastion hosts in an Auto Scaling group with Elastic IP addresses to allow inbound Secure Shell (SSH) access. One bastion host is deployed by default, but this number is configurable. (For more information, see the Linux bastion host Quick Start.)*

  • An AWS Identity and Access Management (IAM) instance role with fine-grained permissions for access to AWS services necessary for the deployment process.

  • Security groups to enable communication within the VPC and to restrict access to only necessary protocols and ports.

  • In the private subnets, a customizable MongoDB cluster with the option of running standalone or in replica sets, along with customizable Amazon EBS storage. The Quick Start launches each member of the replica set in a different Availability Zone. However, if you choose an AWS Region that doesn’t provide three or more Availability Zones, the Quick Start reuses one of the zones to create the third subnet.

* You can choose to launch the Quick Start for a new VPC or use your existing VPC. The template that deploys the Quick Start into an existing VPC skips the creation of components marked by asterisks and prompts you for your existing configuration.

The Quick Start launches all the MongoDB-related nodes in the private subnet, so the nodes are accessed by using SSH to connect to the bastion hosts. Instead of using a remote access CIDR for each MongoDB instance, the deployment requires a security group ID of the bastion hosts so remote access can be centrally controlled. If you launch the Quick Start for a new VPC, the bastion security group is created for you. If you launch the Quick Start in an existing VPC, you must create a security group for your bastion hosts or use one that already exists.

MongoDB Constructs

Here are some of the building blocks that are used in this reference deployment.

Replica set. Refers to a group of mongod instances that hold the same data. The purpose of replication is to ensure high availability, in case one of the servers goes down. This reference deployment supports one or three replica sets. In the case of three replica sets, the reference deployment launches three servers in three different Availability Zones (if the region supports it). In production clusters, we recommend using three replica sets (Primary, Secondary0, Secondary1).

All clients typically interact with the primary node for read and write operations. It is possible to choose a secondary node as a preference during read operations, but write operations always go to the primary node and get replicated asynchronously in the secondary nodes. If you choose a secondary node for read operations, watch out for stale data, because the secondary node may not be in sync with the primary node. For more information about how read operations are routed in a replica set, see the MongoDB documentation.

In a development environment, you can start with a single replica set and move to three replica sets during production. Figure 2 shows the MongoDB reference deployment with a replication factor of 3.


        MongoDB cluster on AWS with three replica sets

Figure 2: MongoDB cluster on AWS with three replica sets

  

When a primary instance fails, one of the secondary instances from another Availability Zone becomes the new primary node, thereby guaranteeing automatic failover.

Sharding. Refers to distribution of data across multiple nodes. Storing distinct data across multiple nodes provides horizontal scalability for read and write performance. When you have a large data set, a single node could be bottlenecked by CPU or I/O performance. Sharding resolves this bottleneck by reducing the number of operations each shard node handles, and improves overall cluster performance. This Quick Start doesn't provide direct support for sharding. Instead, it provides a parameter (ReplicaShardIndex) to enable joining the launched replica sets to a sharded cluster. See the MongoDB documentation for details.

Performance Considerations

The reference implementation offers various compute and storage choices. The following table shows some of the compute choices to consider.

Instance Type vCPU Memory (GiB) Workload Type
c3.4xlarge 16 55 Compute-optimized
c3.8xlarge 32 60 Compute-optimized
c4.8xlarge 36 60 Compute-optimized
r3.4xlarge 16 122 Memory-optimized
r3.2xlarge 8 61 Memory-optimized
r3.8xlarge 32 244 Memory-optimized

As a general guideline, consider growing instances horizontally instead of vertically. Horizontal scaling overcomes the limitations of single nodes and avoids single points of failure, and can potentially increase the overall throughput of your cluster.

For storage, depending on your database requirement, you may choose to change the storage volume to be attached to each node. Amazon EBS provides three volume types: General Purpose (SSD) volumes, Provisioned IOPS (SSD) volumes, and Magnetic volumes. These differ in performance characteristics and cost, so you can choose the right storage performance and price depending on the needs of your application. All Amazon EBS volume types offer the same durable snapshot capabilities and are designed for 99.999% availability. This reference deployment supports General Purpose and Provisioned IOPS storage volumes.

The following table shows some of the performance characteristics of each storage type. Depending on your performance requirements, you may want to benchmark your application before deciding on the storage type and Amazon EBS Provisioned IOPS capacity (if chosen).

Volume Type General Purpose (SSD) Provisioned IOPS (SSD)
Storage media SSD-backed SSD-backed
Maximum volume size 16 TiB 16 TiB
Maximum IOPS/volume 10,000 20,000