W E K A
4.3
4.3
  • WEKA v4.3 documentation
    • Documentation revision history
  • WEKA System Overview
    • WEKA Data Platform introduction
      • WEKA system functionality features
      • Converged WEKA system deployment
      • Optimize redundancy in WEKA deployments
    • SSD capacity management
    • Filesystems, object stores, and filesystem groups
    • WEKA networking
    • Data lifecycle management
    • WEKA client and mount modes
    • WEKA containers architecture overview
    • Glossary
  • Planning and Installation
    • Prerequisites and compatibility
    • WEKA cluster installation on bare metal servers
      • Plan the WEKA system hardware requirements
      • Obtain the WEKA installation packages
      • Install the WEKA cluster using the WMS with WSA
      • Install the WEKA cluster using the WSA
      • Manually install OS and WEKA on servers
      • Manually prepare the system for WEKA configuration
        • Broadcom adapter setup for WEKA system
        • Enable the SR-IOV
      • Configure the WEKA cluster using the WEKA Configurator
      • Manually configure the WEKA cluster using the resource generator
      • Perform post-configuration procedures
      • Add clients to an on-premises WEKA cluster
    • WEKA Cloud Deployment Manager Web (CDM Web) User Guide
    • WEKA Cloud Deployment Manager Local (CDM Local) User Guide
    • WEKA installation on AWS
      • WEKA installation on AWS using Terraform
        • Terraform-AWS-WEKA module description
        • Deployment on AWS using Terraform
        • Required services and supported regions
        • Supported EC2 instance types using Terraform
        • WEKA cluster auto-scaling in AWS
        • Detailed deployment tutorial: WEKA on AWS using Terraform
      • WEKA installation on AWS using the Cloud Formation
        • Self-service portal
        • CloudFormation template generator
        • Deployment types
        • AWS Outposts deployment
        • Supported EC2 instance types using Cloud Formation
        • Add clients to a WEKA cluster on AWS
        • Auto scaling group
        • Troubleshooting
      • Install SMB on AWS
    • WEKA installation on Azure
    • WEKA installation on GCP
      • WEKA project description
      • GCP-WEKA deployment Terraform package description
      • Deployment on GCP using Terraform
      • Required services and supported regions
      • Supported machine types and storage
      • Auto-scale instances in GCP
      • Add clients to a WEKA cluster on GCP
      • Troubleshooting
      • Detailed deployment tutorial: WEKA on GCP using Terraform
      • Google Kubernetes Engine and WEKA over POSIX deployment
  • Getting Started with WEKA
    • Manage the system using the WEKA GUI
    • Manage the system using the WEKA CLI
      • WEKA CLI hierarchy
      • CLI reference guide
    • Run first IOs with WEKA filesystem
    • Getting started with WEKA REST API
    • WEKA REST API and equivalent CLI commands
  • Performance
    • WEKA performance tests
      • Test environment details
  • WEKA Filesystems & Object Stores
    • Manage object stores
      • Manage object stores using the GUI
      • Manage object stores using the CLI
    • Manage filesystem groups
      • Manage filesystem groups using the GUI
      • Manage filesystem groups using the CLI
    • Manage filesystems
      • Manage filesystems using the GUI
      • Manage filesystems using the CLI
    • Attach or detach object store buckets
      • Attach or detach object store bucket using the GUI
      • Attach or detach object store buckets using the CLI
    • Advanced data lifecycle management
      • Advanced time-based policies for data storage location
      • Data management in tiered filesystems
      • Transition between tiered and SSD-only filesystems
      • Manual fetch and release of data
    • Mount filesystems
      • Mount filesystems from Single Client to Multiple Clusters (SCMC)
    • Snapshots
      • Manage snapshots using the GUI
      • Manage snapshots using the CLI
    • Snap-To-Object
      • Manage Snap-To-Object using the GUI
      • Manage Snap-To-Object using the CLI
    • Quota management
      • Manage quotas using the GUI
      • Manage quotas using the CLI
  • Additional Protocols
    • Additional protocol containers
    • Manage the NFS protocol
      • Supported NFS client mount parameters
      • Manage NFS networking using the GUI
      • Manage NFS networking using the CLI
    • Manage the S3 protocol
      • S3 cluster management
        • Manage the S3 service using the GUI
        • Manage the S3 service using the CLI
      • S3 buckets management
        • Manage S3 buckets using the GUI
        • Manage S3 buckets using the CLI
      • S3 users and authentication
        • Manage S3 users and authentication using the CLI
        • Manage S3 service accounts using the CLI
      • S3 rules information lifecycle management (ILM)
        • Manage S3 lifecycle rules using the GUI
        • Manage S3 lifecycle rules using the CLI
      • Audit S3 APIs
        • Configure audit webhook using the GUI
        • Configure audit webhook using the CLI
        • Example: How to use Splunk to audit S3
      • S3 supported APIs and limitations
      • S3 examples using boto3
      • Access S3 using AWS CLI
    • Manage the SMB protocol
      • Manage SMB using the GUI
      • Manage SMB using the CLI
  • Operation Guide
    • Alerts
      • Manage alerts using the GUI
      • Manage alerts using the CLI
      • List of alerts and corrective actions
    • Events
      • Manage events using the GUI
      • Manage events using the CLI
      • List of events
    • Statistics
      • Manage statistics using the GUI
      • Manage statistics using the CLI
      • List of statistics
    • Insights
    • System congestion
    • Security management
      • Obtain authentication tokens
      • KMS management
        • Manage KMS using the GUI
        • Manage KMS using the CLI
      • TLS certificate management
        • Manage the TLS certificate using the GUI
        • Manage the TLS certificate using the CLI
      • CA certificate management
        • Manage the CA certificate using the GUI
        • Manage the CA certificate using the CLI
      • Account lockout threshold policy management
        • Manage the account lockout threshold policy using GUI
        • Manage the account lockout threshold policy using CLI
      • Manage the login banner
        • Manage the login banner using the GUI
        • Manage the login banner using the CLI
      • Manage Cross-Origin Resource Sharing
    • User management
      • Manage users using the GUI
      • Manage users using the CLI
    • Organizations management
      • Manage organizations using the GUI
      • Manage organizations using the CLI
      • Mount authentication for organization filesystems
    • Expand and shrink cluster resources
      • Add a backend server
      • Expand specific resources of a container
      • Shrink a cluster
    • Background tasks
      • Set up a Data Services container for background tasks
      • Manage background tasks using the GUI
      • Manage background tasks using the CLI
    • Upgrade WEKA versions
  • Licensing
    • License overview
    • Classic license
  • Monitor the WEKA Cluster
    • Deploy monitoring tools using the WEKA Management Station (WMS)
    • WEKA Home - The WEKA support cloud
      • Local WEKA Home overview
      • Deploy Local WEKA Home v3.0 or higher
      • Deploy Local WEKA Home v2.x
      • Explore cluster insights and statistics
      • Manage alerts and integrations
      • Enforce security and compliance
      • Optimize support and data management
    • Set up the WEKAmon external monitoring
    • Set up the SnapTool external snapshots manager
  • Support
    • Get support for your WEKA system
    • Diagnostics management
      • Traces management
        • Manage traces using the GUI
        • Manage traces using the CLI
      • Protocols debug level management
        • Manage protocols debug level using the GUI
        • Manage protocols debug level using the CLI
      • Diagnostics data management
  • Best Practice Guides
    • WEKA and Slurm integration
      • Avoid conflicting CPU allocations
    • Storage expansion best practice
  • WEKApod
    • WEKApod Data Platform Appliance overview
    • WEKApod servers overview
    • Rack installation
    • WEKApod initial system setup and configuration
    • WEKApod support process
  • Appendices
    • WEKA CSI Plugin
      • Deployment
      • Storage class configurations
      • Tailor your storage class configuration with mount options
      • Dynamic and static provisioning
      • Launch an application using WEKA as the POD's storage
      • Add SELinux support
      • NFS transport failback
      • Upgrade legacy persistent volumes for capacity enforcement
      • Troubleshooting
    • Convert cluster to multi-container backend
    • Create a client image
    • Update WMS and WSA
    • BIOS tool
Powered by GitBook
On this page
  • Expansion guidelines
  • weka cluster container command description
  • Expansion procedures on a remote container
  • Modify the memory
  • Modify the network configuration
  • Modify the container IP addresses
  • Add CPU cores to a container
  • Expand SSDs only
  • weka local resources command description
  • Graceful container management: ensuring safe actions
  1. Operation Guide
  2. Expand and shrink cluster resources

Expand specific resources of a container

Guidelines for expansion processes that only involve the addition of a specific resource.

Expanding resources within a container involves dynamically adjusting the allocation of CPU, memory, storage, and other system resources to meet applications' changing demands. By effectively managing these resources, organizations can optimize performance, enhance scalability, and ensure the smooth operation of their containerized applications.

Expansion guidelines

The following commands are available to expand the containers' resources:

  • weka cluster container: Run actions on a remote container (or containers for specific sub-commands).

  • weka local resources: Run actions locally.

Adhere to the following guidelines when expanding specific resources:

  • Specify the container: Run the relevant weka cluster container command with the specific container-id you want to expand. Once you run the command, the container is staged to update in the cluster.

  • View existing resources: To view the non-applied configuration, run the weka cluster container resources <container-id>command.

  • Apply changes on a specific container: To apply changes on a specific container in the cluster, run the weka cluster container apply <container-id> command. It is possible to accumulate several changes on a container and apply only once on completion.

  • Apply changes on a local server: To apply changes in the local container, run the weka local resources apply command.

  • The apply command saves the last configuration: Once the apply command is complete, the last local configuration of the container that successfully joined the cluster is saved. If a failure occurs with the new configuration, the container automatically remains with the existing stable configuration. Run the weka cluster container resources <container-id> --stable command to view the existing configuration.

  • Expansion on active or deactivated containers: You can dynamically expand some of the resources on active containers and others only after deactivating the container. For example, you can add CPU cores only on a deactivated container.

weka cluster container command description

Command: weka cluster container <sub-command> <container-id> [options]

Some sub-commands accept <container-ids>. See details in the following table.

Subcommands

Sub-command
Description
Comment

info-hw

Show hardware information about the containers.

failure-domain

Set the failure domain on the container.

Can only be done on a deactivated container.

dedicate

Set the containers as dedicated to the WEKA cluster.

bandwidth

Limit the bandwidth of the containers.

cores

Change the number of cores in the containers.

Can only be done on a deactivated container.

memory

Set the RAM size dedicated to the container.

auto-remove-timeout

Set the time to wait before removing the containers if it disconnects from the cluster. The minimum value is 60. Use 0 to disable automatic removal.

This subcommand only applies to clients.

management-ips

Set the management IPs of the container. To achieve high availability, set two IPs.

resources

Get the resources of the containers.

restore

Restore staged resources of the containers or all containers to their stable state.

Specify the list of containers with a space delimiter.

apply

Apply changes to the resources on the containers.

Specify the list of containers with a space delimiter.

activate

Activate the containers.

Specify the list of containers with a space delimiter.

deactivate

Deactivate the containers.

Specify the list of containers with a space delimiter.

clear-failure

Clear the last failure fields of the containers.

Specify the list of containers with a space delimiter.

add

Add a container to the cluster.

remove

Remove a container from the cluster.

net

List the WEKA-dedicated networking devices in the containers.

Specify the list of containers with a space delimiter.

Options

Option
Description

-b

Only return backend containers.

-c

Only return client containers.

-l

Only return containers that are part of the cluster leadership.

-L

Only return the cluster leader.

Expansion procedures on a remote container

Modify the memory

Run the following command lines on the active container:

weka cluster container memory <container-id> <capacity-memory>
weka cluster container apply <container-id>
Example

To change the memory of container-id 0 to 1.5 GiB, run the following commands:

weka cluster container memory 0 1.5GiB
weka cluster container apply 0

After reducing the memory allocation for a container, follow these steps to release hugepages on each container:

  1. Stop the container locally. Run weka local stop

  2. Release hugepages. Run weka local run release_hugepages

  3. Restart the container locally. Run weka local start

Modify the network configuration

Run the following command lines on the active container:

weka cluster container net add <container-id> <device>
weka cluster container apply <container-id>
Example

To add another network device to container-id 0, run the following commands:

weka cluster container net add 0 eth2
weka cluster container apply 0

Modify the container IP addresses

Run the following command lines on the active container:

weka cluster container management-ips <container-id> <management-ips>
weka cluster container apply <container-id>
Example

To change the management IPs on container-id 0, run the following commands:

weka cluster container management-ips 0 192.168.1.10 192.168.1.20
weka cluster container apply 0

The number of management IP addresses determines whether the container uses high-availability (HA) networking, causing each IO process to use both containers' NICs.

A container with two IP addresses uses HA networking. A container with only one IP does not use HA networking.

If the cluster uses InfiniBand and Ethernet network technologies, you can define up to four IP addresses.

Add CPU cores to a container

You can add dedicated CPU cores to a container locally and on a deactivated container. The added cores must be dedicated to a specific process type: compute, drives, or frontend.

For clarity, the following procedure exemplifies expansion on the container running the compute processes.

Procedure

  1. Deactivate the container. Run the following command: weka cluster container deactivate <container-ids>

  2. Run the following command line to set the number of dedicate cores to the compute container: weka cluster container cores <container-id> <number of total cores> --compute-dedicated-cores <number of total cores> --no-frontends

  3. Apply the changes. Run the following command: weka cluster container apply <container-id>

  4. Check the number of cores dedicated to the compute processes. Run the following command: weka cluster container <container-ids>

Example

The following example sets 10 cores to the compute0 container. The container id is 1. It is important to add --no-frontends to allocate the cores dedicated to the compute processes.

weka cluster container deactivate 1
weka cluster container cores 1 10 --compute-dedicated-cores 10 --no-frontends
weka cluster container apply 1
weka cluster container 1
//response
ROLES       NODE ID  CORE ID
MANAGEMENT  0        <auto>
COMPUTE     1        <auto>
COMPUTE     2        <auto>
COMPUTE     3        <auto>
COMPUTE     4        <auto>
COMPUTE     5        <auto>
COMPUTE     6        <auto>
COMPUTE     7        <auto>
COMPUTE     8        <auto>
COMPUTE     9        <auto>
COMPUTE     10       <auto>
  1. Activate the container. Run the following command: weka cluster container activate <container-ids>

Expand SSDs only

You can add new SSD drives to a container. However, adding SSD drives may alter the ratio between SSDs and drive cores, potentially impacting performance. For optimal system efficiency, take note of this adjustment when considering expansion.

Procedure

  1. Identify the relevant container ID to which you want to add the SSD drive. Run the command: weka cluster container

  2. Scan for new drives. Run the command: weka cluster drive scan

  3. To add the SSDs, run the following command: weka cluster drive add <container-id> <device-paths> Where: container-id is the Identifier of the drive container to add the local SSD drives. device-paths is a list of block devices that identify local SSDs. It must be a valid Unix network device name. Format: Space-separated strings. Example: /dev/nvme0n1 /dev/nvme1n1

weka local resources command description

You can also modify the resources on a local container by connecting to it and running the weka local resources command equivalent to its weka cluster remote counterpart command.

These local commands have the same semantics as their remote counterpart. You do not specify the container-id as the first parameter. All actions are done on the local container.

Command: weka local resources

Subcommands

Sub-command
Description
Comment

import

Import resources from a file.

export

Export stable resources to a file.

restore

Restore resources from stable resources.

apply

Apply changes to the resources locally.

cores

Change the number of cores in the container.

Can only be done on a deactivated container.

base-port

Change the port-range used by the container. Weka containers require 100 ports to operate.

memory

Set the RAM size dedicated to the container.

dedicate

Set the container as dedicated to the WEKA cluster.

bandwidth

Limit the bandwidth of the container.

management-ips

Set the container's management IPs. To achieve high-availability, set two IPs.

join-ips

Set the IPs and ports of all containers in the cluster. This enables the container to join the cluster using these IPs.

failure-domain

Set the container failure-domain.

Can only be done on a deactivated container.

net

List the WEKA-dedicated networking devices in a container.

Options

Option
Description
Comment

--stable

List the resources from the last successful container boot.

-C

The container name.

Example: Set dedicated cores for the compute processes locally

The following example sets 10 cores to the compute0 container. The container id is 1. It is important to add --no-frontends to allocate the cores dedicated to the compute processes.

weka cluster container deactivate 1

//Connect to the relevant server to run the following commands locally.

weka local resources cores 10 --compute-dedicated-cores 10 -C compute0 --no-frontends
weka local resources -C compute0

//response
ROLES       NODE ID  CORE ID
MANAGEMENT  0        <auto>
COMPUTE     1        <auto>
COMPUTE     2        <auto>
COMPUTE     3        <auto>
COMPUTE     4        <auto>
COMPUTE     5        <auto>
COMPUTE     6        <auto>
COMPUTE     7        <auto>
COMPUTE     8        <auto>
COMPUTE     9        <auto>
COMPUTE     10       <auto>

Graceful container management: ensuring safe actions

When running the commands weka local stop, weka local restart, and weka local resources apply, it is crucial to perform these actions safely to minimize the risk of unexpected issues or disruptions. To achieve this, use the --graceful option. This practice is particularly important during cluster maintenance.

By using the --graceful option with these commands, the system prioritizes safety before executing any action.

The --graceful option applies exclusively to cluster containers and not to protocol containers.

How --graceful works:

  1. Action initiation: Sends a request to the container specifying the desired action (STOP, RESTART, or APPLY_RESOURCES).

  2. Safety check: Evaluates feasibility based on current state and safety constraints (for example, sufficient resources post-action).

  3. Draining and execution: If safe, the container transitions to the DRAINING state to complete ongoing operations. Once DRAINED, the action is executed.

Example: prioritizing stability

If stopping a container would violate minimum failure domain requirements, --graceful prevents the stop to maintain system health.

Example: prioritizing stability
CONTAINER ID  HOSTNAME  CONTAINER  IPS             STATUS          REQUESTED ACTION  REQUESTED ACTION FAILURE
0             Host-0    drives0    10.108.206.201  UP              STOP              Upon completion of this operation, there are 4 reliable containers available for cluster leadership, while the requirement is for 5.                 
6             Host-0    compute0   10.108.206.201  DRAINED (DOWN)  STOP                                 
12            Host-0    frontend0  10.108.206.201  DRAINING        RESTART     
PreviousAdd a backend serverNextShrink a cluster

Last updated 1 month ago

Ensure the cluster has a drive core to allocate for the new SSD. If a drive core is required, deactivate the container and then add the drive core to the container. See .

Add CPU cores to a container