W E K A
4.3
4.3
  • WEKA v4.3 documentation
    • Documentation revision history
  • WEKA System Overview
    • WEKA Data Platform introduction
      • WEKA system functionality features
      • Converged WEKA system deployment
      • Optimize redundancy in WEKA deployments
    • SSD capacity management
    • Filesystems, object stores, and filesystem groups
    • WEKA networking
    • Data lifecycle management
    • WEKA client and mount modes
    • WEKA containers architecture overview
    • Glossary
  • Planning and Installation
    • Prerequisites and compatibility
    • WEKA cluster installation on bare metal servers
      • Plan the WEKA system hardware requirements
      • Obtain the WEKA installation packages
      • Install the WEKA cluster using the WMS with WSA
      • Install the WEKA cluster using the WSA
      • Manually install OS and WEKA on servers
      • Manually prepare the system for WEKA configuration
        • Broadcom adapter setup for WEKA system
        • Enable the SR-IOV
      • Configure the WEKA cluster using the WEKA Configurator
      • Manually configure the WEKA cluster using the resource generator
      • Perform post-configuration procedures
      • Add clients to an on-premises WEKA cluster
    • WEKA Cloud Deployment Manager Web (CDM Web) User Guide
    • WEKA Cloud Deployment Manager Local (CDM Local) User Guide
    • WEKA installation on AWS
      • WEKA installation on AWS using Terraform
        • Terraform-AWS-WEKA module description
        • Deployment on AWS using Terraform
        • Required services and supported regions
        • Supported EC2 instance types using Terraform
        • WEKA cluster auto-scaling in AWS
        • Detailed deployment tutorial: WEKA on AWS using Terraform
      • WEKA installation on AWS using the Cloud Formation
        • Self-service portal
        • CloudFormation template generator
        • Deployment types
        • AWS Outposts deployment
        • Supported EC2 instance types using Cloud Formation
        • Add clients to a WEKA cluster on AWS
        • Auto scaling group
        • Troubleshooting
      • Install SMB on AWS
    • WEKA installation on Azure
    • WEKA installation on GCP
      • WEKA project description
      • GCP-WEKA deployment Terraform package description
      • Deployment on GCP using Terraform
      • Required services and supported regions
      • Supported machine types and storage
      • Auto-scale instances in GCP
      • Add clients to a WEKA cluster on GCP
      • Troubleshooting
      • Detailed deployment tutorial: WEKA on GCP using Terraform
      • Google Kubernetes Engine and WEKA over POSIX deployment
  • Getting Started with WEKA
    • Manage the system using the WEKA GUI
    • Manage the system using the WEKA CLI
      • WEKA CLI hierarchy
      • CLI reference guide
    • Run first IOs with WEKA filesystem
    • Getting started with WEKA REST API
    • WEKA REST API and equivalent CLI commands
  • Performance
    • WEKA performance tests
      • Test environment details
  • WEKA Filesystems & Object Stores
    • Manage object stores
      • Manage object stores using the GUI
      • Manage object stores using the CLI
    • Manage filesystem groups
      • Manage filesystem groups using the GUI
      • Manage filesystem groups using the CLI
    • Manage filesystems
      • Manage filesystems using the GUI
      • Manage filesystems using the CLI
    • Attach or detach object store buckets
      • Attach or detach object store bucket using the GUI
      • Attach or detach object store buckets using the CLI
    • Advanced data lifecycle management
      • Advanced time-based policies for data storage location
      • Data management in tiered filesystems
      • Transition between tiered and SSD-only filesystems
      • Manual fetch and release of data
    • Mount filesystems
      • Mount filesystems from Single Client to Multiple Clusters (SCMC)
    • Snapshots
      • Manage snapshots using the GUI
      • Manage snapshots using the CLI
    • Snap-To-Object
      • Manage Snap-To-Object using the GUI
      • Manage Snap-To-Object using the CLI
    • Quota management
      • Manage quotas using the GUI
      • Manage quotas using the CLI
  • Additional Protocols
    • Additional protocol containers
    • Manage the NFS protocol
      • Supported NFS client mount parameters
      • Manage NFS networking using the GUI
      • Manage NFS networking using the CLI
    • Manage the S3 protocol
      • S3 cluster management
        • Manage the S3 service using the GUI
        • Manage the S3 service using the CLI
      • S3 buckets management
        • Manage S3 buckets using the GUI
        • Manage S3 buckets using the CLI
      • S3 users and authentication
        • Manage S3 users and authentication using the CLI
        • Manage S3 service accounts using the CLI
      • S3 rules information lifecycle management (ILM)
        • Manage S3 lifecycle rules using the GUI
        • Manage S3 lifecycle rules using the CLI
      • Audit S3 APIs
        • Configure audit webhook using the GUI
        • Configure audit webhook using the CLI
        • Example: How to use Splunk to audit S3
      • S3 supported APIs and limitations
      • S3 examples using boto3
      • Access S3 using AWS CLI
    • Manage the SMB protocol
      • Manage SMB using the GUI
      • Manage SMB using the CLI
  • Operation Guide
    • Alerts
      • Manage alerts using the GUI
      • Manage alerts using the CLI
      • List of alerts and corrective actions
    • Events
      • Manage events using the GUI
      • Manage events using the CLI
      • List of events
    • Statistics
      • Manage statistics using the GUI
      • Manage statistics using the CLI
      • List of statistics
    • Insights
    • System congestion
    • Security management
      • Obtain authentication tokens
      • KMS management
        • Manage KMS using the GUI
        • Manage KMS using the CLI
      • TLS certificate management
        • Manage the TLS certificate using the GUI
        • Manage the TLS certificate using the CLI
      • CA certificate management
        • Manage the CA certificate using the GUI
        • Manage the CA certificate using the CLI
      • Account lockout threshold policy management
        • Manage the account lockout threshold policy using GUI
        • Manage the account lockout threshold policy using CLI
      • Manage the login banner
        • Manage the login banner using the GUI
        • Manage the login banner using the CLI
      • Manage Cross-Origin Resource Sharing
    • User management
      • Manage users using the GUI
      • Manage users using the CLI
    • Organizations management
      • Manage organizations using the GUI
      • Manage organizations using the CLI
      • Mount authentication for organization filesystems
    • Expand and shrink cluster resources
      • Add a backend server
      • Expand specific resources of a container
      • Shrink a cluster
    • Background tasks
      • Set up a Data Services container for background tasks
      • Manage background tasks using the GUI
      • Manage background tasks using the CLI
    • Upgrade WEKA versions
  • Licensing
    • License overview
    • Classic license
  • Monitor the WEKA Cluster
    • Deploy monitoring tools using the WEKA Management Station (WMS)
    • WEKA Home - The WEKA support cloud
      • Local WEKA Home overview
      • Deploy Local WEKA Home v3.0 or higher
      • Deploy Local WEKA Home v2.x
      • Explore cluster insights and statistics
      • Manage alerts and integrations
      • Enforce security and compliance
      • Optimize support and data management
    • Set up the WEKAmon external monitoring
    • Set up the SnapTool external snapshots manager
  • Support
    • Get support for your WEKA system
    • Diagnostics management
      • Traces management
        • Manage traces using the GUI
        • Manage traces using the CLI
      • Protocols debug level management
        • Manage protocols debug level using the GUI
        • Manage protocols debug level using the CLI
      • Diagnostics data management
  • Best Practice Guides
    • WEKA and Slurm integration
      • Avoid conflicting CPU allocations
    • Storage expansion best practice
  • WEKApod
    • WEKApod Data Platform Appliance overview
    • WEKApod servers overview
    • Rack installation
    • WEKApod initial system setup and configuration
    • WEKApod support process
  • Appendices
    • WEKA CSI Plugin
      • Deployment
      • Storage class configurations
      • Tailor your storage class configuration with mount options
      • Dynamic and static provisioning
      • Launch an application using WEKA as the POD's storage
      • Add SELinux support
      • NFS transport failback
      • Upgrade legacy persistent volumes for capacity enforcement
      • Troubleshooting
    • Convert cluster to multi-container backend
    • Create a client image
    • Update WMS and WSA
    • BIOS tool
Powered by GitBook
On this page
  • Introduction
  • Requirements for WEKA over POSIX with GKE
  • Workflow
  • 1. Deploy GKE in Standard mode with Ubuntu OS
  • 2. Set up WEKA client on existing GKE worker nodes
  • 3. Configure automated WEKA setup client on worker nodes
  • 4. Install and configure the WEKA CSI plugin
  • 5. Set up WEKA storage for GKE pods
  1. Planning and Installation
  2. WEKA installation on GCP

Google Kubernetes Engine and WEKA over POSIX deployment

A step-by-step guide to setting up Google Kubernetes Engine (GKE) with WEKA on Google Cloud Platform (GCP), enhancing storage and scalability for demanding Kubernetes workloads.

PreviousDetailed deployment tutorial: WEKA on GCP using TerraformNextManage the system using the WEKA GUI

Last updated 11 months ago

Introduction

Google Kubernetes Engine (GKE) is a managed Kubernetes service for deploying, managing, and scaling containerized applications. WEKA is a high-performance, scalable storage platform that integrates seamlessly with Kubernetes clusters to provide persistent storage for demanding containerized applications and workflows.

Combining GKE and WEKA results in an easily automated and managed Kubernetes environment, delivering best-in-class performance at scale.

Requirements for WEKA over POSIX with GKE

  • GKE must be deployed in .

  • GKE Worker nodes must be configured with Ubuntu OS.

If (supported on GKE only with Google Container Optimized OS) is required, configure WEKA over NFS. For details, see Manage the NFS protocol.

Workflow

  1. Deploy GKE in Standard mode with Ubuntu OS.

  2. Set up WEKA client on existing GKE worker nodes.

  3. Configure automated WEKA setup client on worker nodes.

  4. Install and configure the WEKA CSI plugin.

  5. Set up WEKA storage for GKE pods.

1. Deploy GKE in Standard mode with Ubuntu OS

Procedure:

  1. Go to the GCP menu, select Kubernetes Engine, and then Clusters.

  1. Click CREATE to create a new cluster.

  1. If prompted, click SWITCH TO STANDARD CLUSTER. This mode also enables SSH access to worker nodes, which is necessary for installing the WEKA POSIX clients.

  1. Change from a regional to a zonal setup. Select the zone where the WEKA Cluster management IPs are located to ensure optimal communication and performance. This step ensures seamless communication between GKE and the WEKA cluster.

  1. Adjust the node pool settings: Go to Nodes within the default-pool under NODE POOLS in the GKE console, and change the Image type to Ubuntu with containerd (ubuntu_containerd).

  1. Ensure worker nodes meet a minimum configuration of 8 vCPUs and 32 GB RAM. The WEKA client requires a minimum of 2 vCPUs and 5 GB of RAM.

  1. If the GKE cluster was set up in advance, deploy the WEKA cluster to the same networking VPC and subnet. Otherwise, ensure that the GKE cluster networking is configured within the same VPC and subnet as the WEKA management IPs. Aligning networking elements per the recommendation will ensure optimal performance.

  1. Click CREATE to create the cluster.

  1. Wait for the cluster status to indicate Ready or Green before proceeding with further configuration or deployment tasks.

2. Set up WEKA client on existing GKE worker nodes

Perform this procedure for each GKE worker node individually.

Any new GKE worker nodes added later will require these steps for WEKA client installation unless the following automation steps are implemented.

Before You Begin

Ensure SSH access to the GKE worker nodes is available to install the WEKA client.

Procedure:

  1. Identify the names of the GKE worker nodes where the WEKA client will be installed.

  1. Go to Google Cloud Platform > Compute Engine > VM Instance console. Locate the identified GKE worker node, select SSH connect from the dropdown menu, and choose Open in browser window to initiate the SSH connection.

  1. To avoid CPU pinning conflicts with GKE, start the WEKA client using a stateless client mount. Authorize the SSH connection, then add the WEKA client from the existing WEKA cluster. For details, see the Add clients to an on-premises WEKA cluster procedure.

3. Configure automated WEKA setup client on worker nodes

Google Cloud Platform (GCP) allows the addition of startup scripts at the project level, ensuring each new instance runs the script. By using metadata lookups, the WEKA client installation you can restrict to GKE cluster systems.

With auto-scaling enabled, GKE automatically adds and sets up the WEKA client on each new worker node.

Procedure:

  1. In the GCP Compute Engine console, scroll to the bottom of the left-side menu.

  2. Select Metadata under the Settings section.

  3. Click EDIT, then select + ADD ITEM.

  4. Set the key name to startup-script (no spaces), and paste the following GKE WEKA client install script into the Value field. Replace the following input values according to your environment:

    • WEKA_FS (line 11)

    • WEKA_HOSTS (line 17)

    • GKE_CLUSTER_NAME (line 99)

GKE WEKA client install script
curl -sS -H 'Metadata-Flavor: Google' 'http://metadata.google.internal/computeMetadata/v1/instance/?recursive=true&alt=json' | jq '.attributes."cluster-name"' -r

(
    #!/usr/bin/env bash

    set -euo pipefail

    DEBIAN_FRONTEND=noninteractive
    ROOT_MOUNT_DIR="${ROOT_MOUNT_DIR:-/root}"
    
    export WEKA_FS="default"

    # Mount point for weka filesystem
    export WEKA_MOUNT="/mnt/gkeclient"
    
    # Its good to add 2-3 servers in case one is not available 
    export WEKA_HOSTS="10.0.0.8,10.0.0.9,10.0.0.10"
    
    # Timeout for how long the client is inaccessible before being removed from the cluster
    
    # Default is 86400 (24hrs) in a more dynamic environment it can be lower. 
    export WEKA_CLIENTTIMEOUNT="3600"
    
    # Number of cores to add to WEKA FrontEnd.
    export WEKA_FRONTENDCORES=2
    
    # First IP taken from WEKA_HOSTS list to download the client from.
    export WEKA_DOWNLOADIP=$(echo "$WEKA_HOSTS" | cut -d',' -f1)
  
  
    echo "Installing dependencies"
    apt-get update
    apt-get install -y apt-transport-https curl gnupg lsb-release jq

    echo "Installing gcloud SDK"
    snap install google-cloud-sdk --classic

    echo "Getting node metadata"
    ALL_METADATA="$(curl -sS -H 'Metadata-Flavor: Google' 'http://metadata.google.internal/computeMetadata/v1/instance/?recursive=true&alt=json')"
    NODE_NAME="$(curl -sS http://metadata.google.internal/computeMetadata/v1/instance/name -H 'Metadata-Flavor: Google')"
    ZONE="$(curl -sS http://metadata.google.internal/computeMetadata/v1/instance/zone -H 'Metadata-Flavor: Google' | awk -F  "/" '{print $4}')"

    echo "Setting up disks"
    DISK_NAME="$NODE_NAME-wekadir"

    if ! gcloud compute disks list --filter="name:$DISK_NAME" | grep "$DISK_NAME" > /dev/null; then
        echo "Creating $DISK_NAME"
        gcloud compute disks create "$DISK_NAME" --size=$(( 1024*20 )) --zone="$ZONE"
    else
        echo "$DISK_NAME already exists"
    fi

    if ! gcloud compute instances describe "$NODE_NAME" --zone "$ZONE" --format '(disks[].source)' | grep "$DISK_NAME" > /dev/null; then
        echo "Attaching $DISK_NAME to $NODE_NAME"
        gcloud compute instances attach-disk "$NODE_NAME" --device-name=sdb --disk "$DISK_NAME" --zone "$ZONE"
    else
        echo "$DISK_NAME is already attached to $NODE_NAME"
    fi
    function create_wekaio_partition() {
        echo "--------------------------------------------"
        echo " Creating local filesystem on WekaIO volume "
        echo "--------------------------------------------"

        wekaiosw_device="/dev/sdb"
        if mount | grep -w $wekaiosw_device | grep -w /opt/weka; then
          echo "Weka volume is already mounted"
        else
          echo "Formatting and mounting Weka trace volume"
          mkfs.ext4 -L wekaiosw ${wekaiosw_device} || return 1
          mkdir -p /opt/weka || return 1
          mount $wekaiosw_device /opt/weka || return 1
          echo "LABEL=wekaiosw /opt/weka ext4 defaults 0 2" >>/etc/fstab
        fi
    }
    function prepare_weka_env() {
        echo "--------------- ENV ---------------"
        env
        echo "--------------- ENV ---------------"
        create_wekaio_partition || logger -s -t weka.install "Failed creating wekaio partition"
    }

    function start_weka_client() {
        prepare_weka_env
        if ! which weka; then
          echo "Installing agent from ${WEKA_DOWNLOADIP}"
          curl --fail --max-time 10 "http://${WEKA_DOWNLOADIP}:14000/dist/v1/install" | sh || logger -s -t weka.install "Failed installing agent from the first backend"
        else
          echo "Weka seems already installed, skipping agent install"
        fi
        mkdir -p ${WEKA_MOUNT}
        if mount | grep -w ${WEKA_MOUNT}; then
          echo "Weka filesystem seems already mounted on endpoint, skipping mount"
        else          
          mount -t wekafs ${WEKA_HOSTS}/${WEKA_FS} ${WEKA_MOUNT} -o remove_after_secs=${WEKA_CLIENTTIMEOUNT},num_cores=${WEKA_FRONTENDCORES},net=udp || logger -s -t weka.install "Error mounting filesystem"
        fi
    }

## Update to the name of the GKE cluster
GKE_CLUSTER_NAME=my-gke-cloud-name
GKE_METADATA_CLUSTER_NAME=$(curl -sS -H 'Metadata-Flavor: Google' 'http://metadata.google.internal/computeMetadata/v1/instance/?recursive=true&alt=json' | jq '.attributes."cluster-name"' -r)

if [ "$GKE_CLUSTER_NAME" != "GKE_METADATA_CLUSTER_NAME" ]; then
    echo "Instance does not belong to GKE cluster $GKE_CLUSTER_NAME. Skipping installation"
else
    echo "Instance belongs to GKE cluster, initializing Weka client installation"
    start_weka_client
fi

) >/root/startup.out 2>/root/startup.err

  1. After adding the startup script, click SAVE at the bottom of the page.

  2. Test the script:

    • Increase the Node Pools node count.

    • Check the client list in WEKA UI to verify that the new clients have been added.

4. Install and configure the WEKA CSI plugin

To install and configure the WEKA CSI plugin, follow the procedures in the WEKA CSI Plugin section.

You may need to adjust the steps according to your specific setup and requirements.

5. Set up WEKA storage for GKE pods

To set up WEKA storage for use by GKE pods, follow the procedures in theDynamic and static provisioning section, in the CSI Plugin section.

Follow these steps to deploy GKE in Standard mode with Ubuntu OS for the worker nodes. For complete GKE documentation, visit the .

Standard mode
GPUDirect-TCPX
GCP documentation