Clone the GitHub repository:
Enter the sagemaker-hyperpod directory:
Verify AWS CLI region configuration:
Verify the region listed is the desired region for the SageMaker Hyperpod cluster. If it is not correct set the AWS_REGION environment variable to the correct region.
Ensure the optional parameter "Availability zone ID to deploy the backup private subnet" is configured with a valid entry. If the CloudFormation template has already been deployed, update the existing stack using the existing template.
Edit the sagemaker-hyperpod-SecurityGroup rule created by the CloudFormation template. Add the following inbound rules to allow access from your management workstation's CIDR range:
TCP port 22 (SSH)
TCP port 14000 (WEKA UI)
This ensures that your management workstation can connect securely to the cluster.
Run set_env_vars.sh
:
Cloud_Formation_Stack
: Name of the existing CloudFormation stack.
Run deploy_weka_into_existing_cluster.sh
replacing <weka_backend_ip>
with either a WEKA backend IP or Application Load Balancer DNS name and <FS Name>
with the name of the WEKA filesystem you wish to mount.
ALB_NAME
: Obtain this from the AWS Console or Terraform output. It is a DNS name.
WEKA_FS_NAME
: Obtain this from the WEKA UI. The default filesystem name is default.
Login to one of the cluster nodes using SSH or SSM.
Verify mount using df:
The WEKA filesystem is mounted at /mnt/weka
on all SageMaker HyperPod nodes.