Deploy a new Amazon SageMaker HyperPod cluster with WEKA
Deployment workflow for new Amazon SageMaker Hyperpod cluster
Prepare the environment for deployment
Deploy AWS CloudFormation template (or an equivalent) to create the prerequisites for the Amazon SageMaker HyperPod cluster.
TheAWS CloudFormation template can be found at: Amazon SageMaker HyperPod > 0. Prerequisites > 2. Own Account.
Ensure the optional parameter "Availability zone ID to deploy the backup private subnet" is configured with a valid entry. If the AWS CloudFormation template has already been deployed, update the existing stack using the existing template.
Retrieve the required for the WEKA package installation by accessing the WEKA download command at: https://get.weka.io/.
Edit the sagemaker-hyperpod-SecurityGroup rule created by the AWS CloudFormation template. Add the following inbound rules to allow access from your management workstation's CIDR range:
TCP port 22 (SSH)
TCP port 14000 (WEKA UI)
This ensures that your management workstation can connect securely to the cluster.
Deploy WEKA cluster using Terraform
Create Amazon SageMaker HyperPod cluster
Clone the WEKA cloud solutions repository Download the repository from GitHub:
git clone https://github.com/weka/cloud-solutions/
Navigate to the SageMaker HyperPod directory Change to the relevant directory:
cd cloud-solutions/aws/sagemaker-hyperpod
Verify AWS cli region configuration
aws configure list
Verify the region listed is the desired region for the SageMaker Hyperpod cluster. If it is not correct, set the AWS_REGION environment variable to the correct region.
export AWS_REGION=<desired region>
Set Cluster Configuration Run the script to set environment variables that defines the SageMaker Hyperpod cluster.
./set_env_vars.sh <Cloud_Formation_Stack>
Cloud_Formation_Stack
: Name of the existing CloudFormation stack.
Source environment variables
source env_vars
Create the cluster Run the deploy script:
./deploy.sh <ALB_NAME> <WEKA_FS_NAME>
ALB_NAME
: Obtain this from the AWS Console or Terraform output. It is a DNS name.WEKA_FS_NAME
: Obtain this from the WEKA UI. The default filesystem name is default.
Monitor cluster creation Track the cluster creation process:
aws sagemaker list-clusters --output table
Continue setup Proceed with the setup by following Section 1, Step E of the Amazon SageMaker HyperPod workshop. The WEKA filesystem is mounted at
/mnt/weka
on all SageMaker HyperPod nodes.