Data protection solution components - VMware vSphere Backups to Amazon S3

Data protection solution components

Distributed data protection solutions

Many data protection solutions are deployed as separate components in a distributed architecture. In the context of this paper, each of these components runs as a process within the guest operating system of an Amazon Elastic Compute Cloud (Amazon EC2) instance, or a vSphere-based virtual machine.

While these solutions are often deployable on a single server that hosts all components, such configurations are generally limited to protecting small numbers of VMs (100 or less). The more VMs that need to be backed up, the more distributed these components become.

Note

Fully distributed architectures tend to suit customers with large numbers of VMs or complex heterogeneous environments who want a unified solution.

Command and control server

This component acts as the management plane, which orchestrates all other components.

Typical functions include:

  • An administrative interface that handles job scheduling, reporting and alerting.

  • Interacting with vCenter to initiate VM snapshots or NBD backup operations.

  • Deployment and updating of other components in the system.

Data mover

Typical functions of data movers include:

  • Streaming data from the vSphere environment over the network to the primary backup target (such as a disk or cloud repository).

  • Deduplication or compression or other data efficiency operations before transmitting to the backup target.

HotAdd backup proxy

This is a special type of Data Mover that is required when you use the HotAdd transport. It is a VM that resides within the protected vSphere cluster and it requires direct access to the same shared storage as the VMs it is protecting.

Cloud repositories - Native backups to Amazon S3

Amazon S3 provides a highly scalable and cost-effective storage solution that is ideal for backups. It is designed for 99.999999999% durability (eleven nines); all objects stored within an Amazon S3 bucket are automatically copied to multiple devices spanning a minimum of three Availability Zones.

Amazon S3 offers eight storage classes, ranging from active classes like Amazon S3 Standard to archive classes like Amazon S3 Glacier Deep Archive. For more information about Amazon S3 storage classes, see the storage class details on the AWS website.

Each Amazon S3 storage class has a distinct set of properties meant to optimize cost for a given access pattern or performance requirement. In the case of backup data, considerations regarding Recovery Time Objective (RTO) are additional key drivers – this is particularly relevant when deciding whether or not to place data in Amazon S3 Glacier or Amazon S3 Glacier Deep Archive. Finally, it is important to ensure the data protection solution provided by the APN technology partner of choice supports the desired storage class.

Amazon S3 also supports a wide variety of management features and security controls to help you view, manage and secure the data stored on Amazon S3.

Cloud repository proxy - Assisted backups to Amazon S3

This is a component that front-ends Amazon S3 endpoints for one or all of the following reasons:

  • To present a non-Amazon S3-native interface to the backup solution. Virtual Tape Library (VTL) or Network File System (NFS) are the two most common types.

  • Mitigation of bandwidth delay product issues when latency is high to the Amazon S3 endpoints. This is usually accomplished through network protocol manipulation.

  • Caching backup data locally before transmission to Amazon S3. This assists with RTO adherence in environments with limited throughput available to Amazon S3.

Note

Some solutions use AWS Storage Gateway for this purpose.

Disk repositories - Backing up to block storage

Disk repositories are servers that directly store backed up VMs on some type of block storage device. Possible examples include:

  • VMs in a local vSphere environment running Windows or Linux with a large virtual disk back-ended by a LUN on a SAN array. They can be self-built or virtual appliances.

  • Physical appliances (or clusters of appliances) containing direct-attached storage that present a CIFS, NFS, or iSCSI interface to the hosts in a vSphere cluster.

  • Amazon EC2 instances (or clusters of them) with a combination of NVME-backed instance storage for caching and Amazon Elastic Block Store (Amazon EBS) volumes for storing backup data.

Hyperconverged data protection solutions

HCI combines the block storage necessary for a disk repository with the data protection solution itself. Consisting of horizontally-scaled clusters, each node added contributes disk storage to the cluster and can perform all of the functions of the components described above.

Note

Hyperconverged solutions tend to fit customers who primarily run workloads on vSphere and are looking to simplify their backup infrastructure.

SaaS-based data protection solutions

Some partners offer their solution as a fully managed service. Components deployed in the on-premises environment are limited and configuration on the customer’s part is minimal. While VM backups are stored on Amazon S3, all AWS services involved are configured, maintained, and billed on the customer’s behalf by the SaaS provider.

Note

Customers seeking the simplest solution with the most rapid time-to-value benefit the most from this type of architecture.

Important features

The following features are available when using a third-party backup vendor with Amazon S3.

  • Client-side data efficiency - Data efficiency mechanisms such as deduplication or compression occur on the source itself before transmitting data to the backup target (for instance, a HotAdd Backup Proxy that eliminates duplicate blocks inside VMDKs before sending).

    Backup solution vendors sometimes quote efficiency ratios as high as 50:1. Whether this is achieved in practice varies according to a number of variables, including:

    • Redundancy and compressibility the data in the protected VMs

    • If data within the VMs is already deduplicated or compressed (MPEG)

    • Solution-specific details such as fixed-length or variable-length deduplication

    • Resources such as vCPU dedicated to this task on the HotAdd backup proxy

  • Global deduplication - The ability of a given solution to deduplicate blocks, objects, or files across all customer data – regardless of the backup repository type. For instance, a global namespace that can deduplicate backups spread across Amazon S3, Amazon EBS, or on-premises disk repositories.

    Solutions that incorporate this type of feature often greatly reduce the monthly storage expenditure necessary to maintain a given retention and tiering strategy.

  • Consistency of volumes and applications - Raw snapshots of running VMs will capture the point-in-time state of virtual disks, regardless of any incomplete IO operations that might be occurring.

    When a snapshot of a VM starts, if the VMware tools are installed, they will communicate with the Volume Shadow Copy Service (VSS) (Windows VMs) or vmsync driver (Linux VMs) to perform a freeze operation on the attached volumes. This commits in-flight IOs via mechanisms such as flushing in-memory write buffers to disk. When this is finished, vSphere is notified, the snapshot occurs, and a thaw operation releases the volume to continue IO.

    While this protects the volume itself, application-level operations that might be in progress are unknown to a volume-level quiescence provider. This is known as a volume-consistent backup.

    If VSS writers or native vendor-provided drivers are registered, supported processes such as Microsoft SQL Server or Oracle RDBMS will be notified. A s imilar, but application-specific, quiescence procedure then occurs.

    A snapshot that quiesces applications in this way is known as an application-consistent backup.

    During any such process, VMs are stunned for a short period (normally measured in microseconds). Large VMs that are resource-intensive on a consistent basis might experience noticeable stun periods that result in perceptible service interruption. Pauses of several seconds are not uncommon. Some vendors provide native application-integrated drivers designed to eliminate this issue.