The following diagram illustrates the components of deployment in AWS:
AWS Architecture Diagram
Best Practices
Backup and Recovery
Resiliency
The Weka system is a distributed cluster protected from 2 or 4 failure domains failures, providing fast rebuild times as described in the Weka system overview section.
Instance Failure
In case of an instance failure, the Weka system will rebuild the data. To regain the reduced compute and storage due to the instance failure to the cluster, add a new instance to the cluster.
Upload Snapshots to S3
It is advisable to use periodic (incremental) snapshots to back-up the data and protect from multiple EC2 instances failures. The recovery point objective (RPO) would be determined by the cadence in which the snapshots are taken and uploaded to S3. The RPO changes between the type of data, regulations, and company policies, but it is advisable to upload at least daily snapshots (Snap-To-Object) of the critical filesystems.
In case of a failure and a need to recover from a backup, it is just a matter of spinning up a cluster using the Self-Service Portal or CloudFormation and creating filesystems from those snapshots. There is no need to wait for the data to reach the EC2 volumes. It is instantly accessible via S3. The recovery time objective (RTO) for this operation mainly depends on the time it takes to deploy the CloudFormation stack and will typically be below 30 min.