Using Amazon S3 compatible storage on Snow Family devices with a cluster of Snow devices - AWS Snowball Edge Developer Guide

Using Amazon S3 compatible storage on Snow Family devices with a cluster of Snow devices

A cluster is a collection of three or more Snowball Edge devices used as a single logical unit for local storage and compute purposes. A cluster offers two primary benefits over a standalone Snowball Edge device for local storage and computing:

  • Increased durability – The S3 data stored in a cluster of Snowball Edge devices enjoys increased data durability over a single device. In addition, the data on the cluster remains safe and viable, despite possible hardware outages affecting the cluster. Clusters can withstand the loss of one device in clusters of 3 and 4 devices and up to two devices in clusters of 5 to 16 devices before the data is in danger. You can replace unhealthy nodes to maintain the durability and safety of data stored in the cluster.

  • Increased storage – With Snowball Edge storage optimized devices, you can create a single, 16 node cluster with up to 2.6 PB of usable S3-compatible storage capacity. With Snowball Edge compute optimized devices, you can create a single, 16 node cluster of up to 501 TB of usable S3-compatible storage capacity.

A cluster of Snowball Edge devices is made of leaderless nodes. Any node can write data to and read data from the entire cluster, and all nodes are capable of performing the behind-the-scenes management of the cluster.

Keep the following considerations in mind when planning to use a cluster of Snowball Edge devices:

  • We recommend that you provide a redundant power source for all devices in the cluster to reduce potential performance and stability issues for the cluster.

  • As with standalone local storage and compute jobs, the data stored in a cluster can't be imported into Amazon S3 without ordering additional devices as a part of separate import jobs. If you order additional devices as import jobs, you can transfer the data from the cluster to the import job devices.

  • To get data onto a cluster from Amazon S3, use the Amazon S3 API to create Amazon S3 buckets on the cluster to store and retrieve objects from S3. Also, you can use AWS DataSync to transfer objects between AWS storage services and Amazon S3 compatible storage on Snow Family devices on a Snowball Edge device. For more information, see Configuring transfers with S3 compatible storage on Snowball Edge.

  • You can create a job to order a cluster of devices from the AWS Snow Family Management Console, the AWS CLI, or one of the AWS SDKs. For more information, see Getting Started.

  • Each device in the cluster has a node ID. A node ID is a unique identifier for each device in the cluster, like a job ID for a standalone device. You can get node IDs from the AWS Snow Family Management Console, the AWS CLI, the AWS SDKs, and the Snowball Edge client. The Snowball Edge client commands describe-device and describe-cluster return node IDs with other information about devices or the cluster.

  • The lifespan of a cluster is limited by the security certificate granted to the cluster devices when the cluster is provisioned. By default, Snowball Edge devices can be used for up to 360 days before they need to be returned. At the end of that time, the devices stop responding to read/write requests. If you need to keep one or more devices for longer than 360 days, contact AWS Support.

  • When AWS receives a returned device that was part of a cluster, we perform a complete erasure of the device. This erasure follows the National Institute of Standards and Technology (NIST) 800-88 standards.

Amazon S3 compatible storage on Snow Family devices cluster fault tolerance and storage capacity
Cluster size Fault tolerance Storage capacity of Snowball Edge Compute Optimized (with AMD EPYC Gen1, HDD, and optional GPU) devices (in TB) Storage capacity of Snowball Edge Compute Optimized (Compute Optimized with AMD EPYC Gen2 and NVMe) devices (in TB) Storage capacity of Snowball Edge storage optimized 210 TB devices (in TB)

3

Loss of up to 1 node

83

38

438

4

Loss of up to 1 node

125

57

657

5

Loss of up to 2 nodes

125

57

657

6

Loss of up to 2 nodes

167

76

904

7

Loss of up to 2 nodes

209

95

1096

8

Loss of up to 2 nodes

250

114

1315

9

Loss of up to 2 nodes

292

133

1534

10

Loss of up to 2 nodes

334

152

1754

11

Loss of up to 2 nodes

370

165

1970

12

Loss of up to 2 nodes

376

171

1973

13

Loss of up to 2 nodes

418

190

2192

14

Loss of up to 2 nodes

459

209

2411

15

Loss of up to 2 nodes

495

225

2625

16

Loss of up to 2 nodes

501

228

2631

After you unlock a cluster, you're ready to store and access data on that cluster. You can use the Amazon S3 compatible endpoint to read from and write data to a cluster.

To read from or write data to a cluster, you must have a read/write quorum with no more than the allowed number of unavailable nodes in your cluster of devices.

Snowball Edge cluster quorums

A quorum represents the minimum number of Snowball Edge devices in a cluster that must be communicating with each other to maintain a read/write quorum.

When all devices in a cluster are healthy, you have a read/write quorum for your cluster. If one or two of those devices goes offline, you reduce the operational capacity of the cluster. However, you can still read and write to the cluster. With all but one or two devices the cluster operating, the cluster still has a read/write quorum. The number of nodes that can go offline before the operational capacity of the cluster is affected is found in this table.

Quorom may be lost if a cluster loses more than the number of devices indicated in this table. When a quorom is lost, the cluster is offline and the data in the cluster is unavailable. You might be able fix this, or the data might be permanently lost, depending on the severity of the event. If it is a temporary external power event, and you can power the Snowball Edge devices back on and unlock all the nodes in the cluster, your data is available again.

Important

If a minimum quorum of healthy nodes doesn't exist, contact AWS Support.

You can use the describe-cluster command to view the lock state and network reachability of each node. Ensuring that the devices in your cluster are healthy and connected is an administrative responsibility that you take on when you using cluster storage. For more information, see Getting device status.

If you determine one or more nodes are unhealthy, you can replace nodes in the cluster to maintain quorom and the health and stability of your data. For more information, see Replacing a node in a cluster.

Reconnecting an unavailable cluster node

A node, or device within a cluster, can become temporarily unavailable due to an issue like power or network loss without damaging the data on the node. When this happens, it affects the status of your cluster. A node's network reachability and lock status is reported in the Snowball Edge client by using the snowballEdge describe-cluster command.

We recommend that you physically position your cluster so you have access to the front, back, and top of all nodes. This way, you can access power and network cables on the back, shipping labels on the top for node IDs, and LCD screens on the front of the devices for the IP addresses and other administrative information.

When you detect that a node is unavailable, we recommend that you try one of the following procedures, depending on the scenario that caused the node to become unavailable.

To reconnect an unavailable node
  1. Ensure that the node is powered on.

  2. Ensure that the node is connected to the same internal network that the rest of the cluster is connected to.

  3. If you need to power up the node, wait up to 20 minutes for it to finish.

  4. Run the snowballEdge unlock-cluster command or the snowballEdge associate-device command. For an example, see Unlocking Snowball Edge devices.

To reconnect an unavailable node that lost network connectivity, but didn't lose power
  1. Ensure that the node is connected to the same internal network that the rest of the cluster is on.

  2. Run the snowballEdge describe-device command to see when the previously unavailable node is added back to the cluster. For an example, see Getting Device Status.

After you perform the preceding procedures, your nodes should be working normally. You should also have a read/write quorum. If that's not the case, then one or more of your nodes might have a more serious issue and might need to be removed from the cluster.

Replacing a node in a cluster

To replace a node, you first need to order a replacement. You can order a replacement node from the console, the AWS CLI, or one of the AWS SDKs. If you're ordering a replacement node from the console, you can order replacements for any job that hasn't been canceled or completed. Then, you diassociate the unhealthy node from the cluster, connect the replacement node to your network and unlock the cluster including the replacement node, associate the replacement node with the cluster, and restart the Amazon S3 compatible storage on Snow Family devices service.

To order a replacement node from the console
  1. Sign in to the AWS Snow Family Management Console.

  2. Find and choose a job for a node that belongs to the cluster that you created from the Job dashboard.

  3. For Actions, choose Replace node.

    Doing this opens the final step of the job creation wizard, with all settings identical to how the cluster was originally created.

  4. Choose Create job.

Your replacement Snowball Edge is now on its way to you. Use the following procedure to remove the unhealthy node from the cluster.

To remove a node from a cluster
  1. Power off the node to be removed. For more information, see Powering off the Snowball Edge.

  2. Use the describe-cluster command to ensure the unhealthy node unreachable. This is indicated by the value of UNREACHABLE for the State name of the NetworkReachability object.

    snowballEdge describe-cluster --manifest-file path/to/manifest/file.bin --unlock-code unlock-code --endpoint https://ip-address-of-device-in-cluster
    Example of describe-cluster output
    { "ClusterId": "CID12345678-1234-1234-1234-123456789012", "Devices": [ { "DeviceId": "JID12345678-1234-1234-1234-123456789012", "UnlockStatus": { "State": "UNLOCKED" }, "ActiveNetworkInterface": { "IpAddress": "10.0.0.0" }, "ClusterAssociation": { "ClusterId": "CID12345678-1234-1234-1234-123456789012", "State": "ASSOCIATED" }, "NetworkReachability": { "State": "REACHABLE" }, "Tags": [] }, { "DeviceId": "JID12345678-1234-1234-1234-123456789013", "UnlockStatus": { "State": "UNLOCKED" }, "ActiveNetworkInterface": { "IpAddress": "10.0.0.1" }, "ClusterAssociation": { "ClusterId": "CID12345678-1234-1234-1234-123456789012", "State": "ASSOCIATED" }, "NetworkReachability": { "State": "REACHABLE" }, "Tags": [] }, { "DeviceId": "JID12345678-1234-1234-1234-123456789014", "ClusterAssociation": { "ClusterId": "CID12345678-1234-1234-1234-123456789012", "State": "ASSOCIATED" }, "NetworkReachability": { "State": "UNREACHABLE" } } ] }
  3. Use the describe-service command to ensure the status of the s3-snow service is DEGRADED.

    snowballEdge describe-service --service-id s3-snow --device-ip-addresses snow-device-1-address snow-device-2-address --manifest-file path/to/manifest/file.bin --unlock-code unlock-code --endpoint https://snow-device-ip-address
    Example of output of describe-service command
    { "ServiceId": "s3-snow", "Autostart": true, "Status": { "State": "DEGRADED" }, "ServiceCapacities": [ { "Name": "S3 Storage", "Unit": "Byte", "Used": 38768180432, "Available": 82961231819568 } ], "Endpoints": [ { "Protocol": "https", "Port": 443, "Host": "10.0.0.10", "CertificateAssociation": { "CertificateArn": "arn:aws:snowball-device:::certificate/7Rg2lP9tQaHnW4sC6xUzF1vGyD3jB5kN8MwEiYpT" }, "Description": "s3-snow bucket API endpoint", "DeviceId": "JID-beta-207012320001-24-02-05-17-17-26", "Status": { "State": "ACTIVE" } }, { "Protocol": "https", "Port": 443, "Host": "10.0.0.11", "CertificateAssociation": { "CertificateArn": "arn:aws:snowball-device:::certificate/7Rg2lP9tQaHnW4sC6xUzF1vGyD3jB5kN8MwEiYpT" }, "Description": "s3-snow object API endpoint", "DeviceId": "JID-beta-207012320001-24-02-05-17-17-26", "Status": { "State": "ACTIVE" } }, { "Protocol": "https", "Port": 443, "Host": "10.0.0.12", "CertificateAssociation": { "CertificateArn": "arn:aws:snowball-device:::certificate/7Rg2lP9tQaHnW4sC6xUzF1vGyD3jB5kN8MwEiYpT" }, "Description": "s3-snow bucket API endpoint", "DeviceId": "JID-beta-207012240003-24-02-05-17-17-27", "Status": { "State": "ACTIVE" } }, { "Protocol": "https", "Port": 443, "Host": "10.0.0.13", "CertificateAssociation": { "CertificateArn": "arn:aws:snowball-device:::certificate/7Rg2lP9tQaHnW4sC6xUzF1vGyD3jB5kN8MwEiYpT" }, "Description": "s3-snow object API endpoint", "DeviceId": "JID-beta-207012320001-24-02-05-17-17-27", "Status": { "State": "ACTIVE" } } ] }
  4. Use the disassociate-device command to disassociate and remove the unhealthy node from the cluster.

    snowballEdge disassociate-device --device-id device-id --manifest-file path/to/manifest/file.bin --unlock-code unlock-code --endpoint https://ip-address-of-unhealthy-device
    Example output of disassociate-device command
    Disassociating your Snowball Edge device from the cluster. Your Snowball Edge device will be disassociated from the cluster when it is in the "DISASSOCIATED" state. You can use the describe-cluster command to determine the state of your cluster.
  5. Use the describe-cluster command again to ensure the unhealthy node is disassociated from the cluster.

    snowballEdge describe-cluster --manifest-file path/to/manifest/file.bin --unlock-code unlock-code --endpoint https:ip-address-of-healthy-device
    Example of describe-cluster command showing node is disassociated
    { "ClusterId": "CID12345678-1234-1234-1234-123456789012", "Devices": [ { "DeviceId": "JID12345678-1234-1234-1234-123456789012", "UnlockStatus": { "State": "UNLOCKED" }, "ActiveNetworkInterface": { "IpAddress": "10.0.0.0" }, "ClusterAssociation": { "ClusterId": "CID12345678-1234-1234-1234-123456789012", "State": "ASSOCIATED" }, "NetworkReachability": { "State": "REACHABLE" }, "Tags": [] }, { "DeviceId": "JID12345678-1234-1234-1234-123456789013", "UnlockStatus": { "State": "UNLOCKED" }, "ActiveNetworkInterface": { "IpAddress": "10.0.0.1" }, "ClusterAssociation": { "ClusterId": "CID12345678-1234-1234-1234-123456789012", "State": "ASSOCIATED" }, "NetworkReachability": { "State": "REACHABLE" }, "Tags": [] }, { "DeviceId": "JID12345678-1234-1234-1234-123456789014", "ClusterAssociation": { "ClusterId": "CID12345678-1234-1234-1234-123456789012", "State": "DISASSOCIATED" } } ] }
  6. Power off and return the unhealthy device to AWS. For more information, see Powering off the Snowball Edge and Returning the Snowball Edge Device.

When the replacement device arrives, use the following procedure to add it to the cluster.

To add a replacement device
  1. Position the replacement device for the cluster such that you have access to the front, back, and top of all devices.

  2. Power up the node and ensure that the node is connected to the same internal network as the rest of the cluster. For more information, see Connecting to Your Local Network.

  3. Use the unlock-cluster command and include the IP address of the new node.

    snowballEdge unlock-cluster --manifest-file path/to/manifest/file.bin --unlock-code unlock-code --endpoint https://ip-address-of-cluster-device --device-ip-addresses node-1-ip-address node-2-ip-address new-node-ip-address

    The state of the new node will be DEGRADED until you associate it with the cluster in the next step.

  4. Use the associate-device command to associate the replacement node with the cluster.

    snowballEdge associate-device --device-ip-address new-node-ip-address
    Example of associate-device command output
    Associating your Snowball Edge device with the cluster. Your Snowball Edge device will be associated with the cluster when it is in the ASSOCIATED state. You can use the describe-device command to determine the state of your devices.
  5. Use the describe-cluster command to ensure the new node is associated with the cluster.

    snowballEdge describe-cluster --manifest-file path/to/manifest/file.bin --unlock-code unlock-code --endpoint https://node-ip-address
    Example of describe-cluster command output
    { "ClusterId": "CID12345678-1234-1234-1234-123456789012", "Devices": [ { "DeviceId": "JID12345678-1234-1234-1234-123456789012", "UnlockStatus": { "State": "UNLOCKED" }, "ActiveNetworkInterface": { "IpAddress": "10.0.0.0" }, "ClusterAssociation": { "ClusterId": "CID12345678-1234-1234-1234-123456789012", "State": "ASSOCIATED" }, "NetworkReachability": { "State": "REACHABLE" }, "Tags": [] }, { "DeviceId": "JID-CID12345678-1234-1234-1234-123456789013", "UnlockStatus": { "State": "UNLOCKED" }, "ActiveNetworkInterface": { "IpAddress": "10.0.0.1" }, "ClusterAssociation": { "ClusterId": "CID12345678-1234-1234-1234-123456789012", "State": "ASSOCIATED" }, "NetworkReachability": { "State": "REACHABLE" }, "Tags": [] }, { "DeviceId": "JID-CID12345678-1234-1234-1234-123456789015", "UnlockStatus": { "State": "UNLOCKED" }, "ActiveNetworkInterface": { "IpAddress": "10.0.0.2" }, "ClusterAssociation": { "ClusterId": "CID12345678-1234-1234-1234-123456789012", "State": "ASSOCIATED" }, "NetworkReachability": { "State": "REACHABLE" }, "Tags": [] } } ] }
  6. On the new node, create two virtual network interfaces (VNIs). For more information, see Starting the Amazon S3 compatible storage on Snow Family devices service

  7. Use the stop-service command to stop the s3-snow service.

    snowballEdge stop-service --service-id s3-snow --device-ip-addresses cluster-device-1-ip-address cluster-device-2-ip-address cluster-device-3-ip-address --manifest-file path/to/manifest/file.bin --unlock-code unlock-code --endpoint https://snow-device-ip-address
    Example of stop-service command output
    Stopping the AWS service on your Snowball Edge. You can determine the status of the AWS service using the describe-service command.
  8. Use the start-service command to start the s3-snow service after adding the new node to the cluster.

    snowballEdge start-service --service-id s3-snow --device-ip-addresses cluster-device-1-ip-address cluster-device-2-ip-address cluster-device-3-ip-address --virtual-network-interface-arns "device-1-vni-ip-address-a" "device-1-vni-ip-address-b" "device-2-vni-ip-address-a" "device-2-vni-ip-address-b" "device-3-vni-ip-address-a" "device-3-vni-ip-address-b" --manifest-file path/to/manifest/file.bin --unlock-code unlock-code --endpoint https://snow-device-ip-address
    Example of start-service command output
    Starting the AWS service on your Snowball Edge. You can determine the status of the AWS service using the describe-service command.
  9. Use the describe-service command to ensure the s3-snow service started.

    snowballEdge describe-service --service-id s3-snow --device-ip-addresses snow-device-1-address snow-device-2-address snow-device-3-address --manifest-file path/to/manifest/file.bin --unlock-code unlock-code --endpoint https://snow-device-ip-address
    Example of descibe-service command output
    { "ServiceId": "s3-snow", "Autostart": true, "Status": { "State": "ACTIVE" }, "ServiceCapacities": [{ "Name": "S3 Storage", "Unit": "Byte", "Used": 38768180432, "Available": 82961231819568 }], "Endpoints": [{ "Protocol": "https", "Port": 443, "Host": "10.0.0.10", "CertificateAssociation": { "CertificateArn": "arn:aws:snowball-device:::certificate/7Rg2lP9tQaHnW4sC6xUzF1vGyD3jB5kN8MwEiYpT" }, "Description": "s3-snow bucket API endpoint", "DeviceId": "JID12345678-1234-1234-1234-123456789012", "Status": { "State": "ACTIVE" } }, { "Protocol": "https", "Port": 443, "Host": "10.0.0.11", "CertificateAssociation": { "CertificateArn": "arn:aws:snowball-device:::certificate/7Rg2lP9tQaHnW4sC6xUzF1vGyD3jB5kN8MwEiYpT" }, "Description": "s3-snow object API endpoint", "DeviceId": "JID12345678-1234-1234-1234-123456789013", "Status": { "State": "ACTIVE" } }, { "Protocol": "https", "Port": 443, "Host": "10.0.0.12", "CertificateAssociation": { "CertificateArn": "arn:aws:snowball-device:::certificate/7Rg2lP9tQaHnW4sC6xUzF1vGyD3jB5kN8MwEiYpT" }, "Description": "s3-snow bucket API endpoint", "DeviceId": "JID12345678-1234-1234-1234-123456789015", "Status": { "State": "ACTIVE" } }, { "Protocol": "https", "Port": 443, "Host": "10.0.0.13", "CertificateAssociation": { "CertificateArn": "arn:aws:snowball-device:::certificate/7Rg2lP9tQaHnW4sC6xUzF1vGyD3jB5kN8MwEiYpT" }, "Description": "s3-snow object API endpoint", "DeviceId": "JID-beta-207012320001-24-02-05-17-17-27", "Status": { "State": "ACTIVE" } }, { "Protocol": "https", "Port": 443, "Host": "10.0.0.14", "CertificateAssociation": { "CertificateArn": "arn:aws:snowball-device:::certificate/7Rg2lP9tQaHnW4sC6xUzF1vGyD3jB5kN8MwEiYpT" }, "Description": "s3-snow bucket API endpoint", "DeviceId": "JID-beta-207012240003-24-02-05-17-17-28", "Status": { "State": "ACTIVE" } }, { "Protocol": "https", "Port": 443, "Host": "10.0.0.15", "CertificateAssociation": { "CertificateArn": "arn:aws:snowball-device:::certificate/7Rg2lP9tQaHnW4sC6xUzF1vGyD3jB5kN8MwEiYpT" }, "Description": "s3-snow object API endpoint", "DeviceId": "JID-beta-207012320001-24-02-05-17-17-28", "Status": { "State": "ACTIVE" } } }] }