W E K A
4.3
4.3
  • WEKA v4.3 documentation
    • Documentation revision history
  • WEKA System Overview
    • WEKA Data Platform introduction
      • WEKA system functionality features
      • Converged WEKA system deployment
      • Optimize redundancy in WEKA deployments
    • SSD capacity management
    • Filesystems, object stores, and filesystem groups
    • WEKA networking
    • Data lifecycle management
    • WEKA client and mount modes
    • WEKA containers architecture overview
    • Glossary
  • Planning and Installation
    • Prerequisites and compatibility
    • WEKA cluster installation on bare metal servers
      • Plan the WEKA system hardware requirements
      • Obtain the WEKA installation packages
      • Install the WEKA cluster using the WMS with WSA
      • Install the WEKA cluster using the WSA
      • Manually install OS and WEKA on servers
      • Manually prepare the system for WEKA configuration
        • Broadcom adapter setup for WEKA system
        • Enable the SR-IOV
      • Configure the WEKA cluster using the WEKA Configurator
      • Manually configure the WEKA cluster using the resource generator
      • Perform post-configuration procedures
      • Add clients to an on-premises WEKA cluster
    • WEKA Cloud Deployment Manager Web (CDM Web) User Guide
    • WEKA Cloud Deployment Manager Local (CDM Local) User Guide
    • WEKA installation on AWS
      • WEKA installation on AWS using Terraform
        • Terraform-AWS-WEKA module description
        • Deployment on AWS using Terraform
        • Required services and supported regions
        • Supported EC2 instance types using Terraform
        • WEKA cluster auto-scaling in AWS
        • Detailed deployment tutorial: WEKA on AWS using Terraform
      • WEKA installation on AWS using the Cloud Formation
        • Self-service portal
        • CloudFormation template generator
        • Deployment types
        • AWS Outposts deployment
        • Supported EC2 instance types using Cloud Formation
        • Add clients to a WEKA cluster on AWS
        • Auto scaling group
        • Troubleshooting
      • Install SMB on AWS
    • WEKA installation on Azure
    • WEKA installation on GCP
      • WEKA project description
      • GCP-WEKA deployment Terraform package description
      • Deployment on GCP using Terraform
      • Required services and supported regions
      • Supported machine types and storage
      • Auto-scale instances in GCP
      • Add clients to a WEKA cluster on GCP
      • Troubleshooting
      • Detailed deployment tutorial: WEKA on GCP using Terraform
      • Google Kubernetes Engine and WEKA over POSIX deployment
  • Getting Started with WEKA
    • Manage the system using the WEKA GUI
    • Manage the system using the WEKA CLI
      • WEKA CLI hierarchy
      • CLI reference guide
    • Run first IOs with WEKA filesystem
    • Getting started with WEKA REST API
    • WEKA REST API and equivalent CLI commands
  • Performance
    • WEKA performance tests
      • Test environment details
  • WEKA Filesystems & Object Stores
    • Manage object stores
      • Manage object stores using the GUI
      • Manage object stores using the CLI
    • Manage filesystem groups
      • Manage filesystem groups using the GUI
      • Manage filesystem groups using the CLI
    • Manage filesystems
      • Manage filesystems using the GUI
      • Manage filesystems using the CLI
    • Attach or detach object store buckets
      • Attach or detach object store bucket using the GUI
      • Attach or detach object store buckets using the CLI
    • Advanced data lifecycle management
      • Advanced time-based policies for data storage location
      • Data management in tiered filesystems
      • Transition between tiered and SSD-only filesystems
      • Manual fetch and release of data
    • Mount filesystems
      • Mount filesystems from Single Client to Multiple Clusters (SCMC)
    • Snapshots
      • Manage snapshots using the GUI
      • Manage snapshots using the CLI
    • Snap-To-Object
      • Manage Snap-To-Object using the GUI
      • Manage Snap-To-Object using the CLI
    • Quota management
      • Manage quotas using the GUI
      • Manage quotas using the CLI
  • Additional Protocols
    • Additional protocol containers
    • Manage the NFS protocol
      • Supported NFS client mount parameters
      • Manage NFS networking using the GUI
      • Manage NFS networking using the CLI
    • Manage the S3 protocol
      • S3 cluster management
        • Manage the S3 service using the GUI
        • Manage the S3 service using the CLI
      • S3 buckets management
        • Manage S3 buckets using the GUI
        • Manage S3 buckets using the CLI
      • S3 users and authentication
        • Manage S3 users and authentication using the CLI
        • Manage S3 service accounts using the CLI
      • S3 rules information lifecycle management (ILM)
        • Manage S3 lifecycle rules using the GUI
        • Manage S3 lifecycle rules using the CLI
      • Audit S3 APIs
        • Configure audit webhook using the GUI
        • Configure audit webhook using the CLI
        • Example: How to use Splunk to audit S3
      • S3 supported APIs and limitations
      • S3 examples using boto3
      • Access S3 using AWS CLI
    • Manage the SMB protocol
      • Manage SMB using the GUI
      • Manage SMB using the CLI
  • Operation Guide
    • Alerts
      • Manage alerts using the GUI
      • Manage alerts using the CLI
      • List of alerts and corrective actions
    • Events
      • Manage events using the GUI
      • Manage events using the CLI
      • List of events
    • Statistics
      • Manage statistics using the GUI
      • Manage statistics using the CLI
      • List of statistics
    • Insights
    • System congestion
    • Security management
      • Obtain authentication tokens
      • KMS management
        • Manage KMS using the GUI
        • Manage KMS using the CLI
      • TLS certificate management
        • Manage the TLS certificate using the GUI
        • Manage the TLS certificate using the CLI
      • CA certificate management
        • Manage the CA certificate using the GUI
        • Manage the CA certificate using the CLI
      • Account lockout threshold policy management
        • Manage the account lockout threshold policy using GUI
        • Manage the account lockout threshold policy using CLI
      • Manage the login banner
        • Manage the login banner using the GUI
        • Manage the login banner using the CLI
      • Manage Cross-Origin Resource Sharing
    • User management
      • Manage users using the GUI
      • Manage users using the CLI
    • Organizations management
      • Manage organizations using the GUI
      • Manage organizations using the CLI
      • Mount authentication for organization filesystems
    • Expand and shrink cluster resources
      • Add a backend server
      • Expand specific resources of a container
      • Shrink a cluster
    • Background tasks
      • Set up a Data Services container for background tasks
      • Manage background tasks using the GUI
      • Manage background tasks using the CLI
    • Upgrade WEKA versions
  • Licensing
    • License overview
    • Classic license
  • Monitor the WEKA Cluster
    • Deploy monitoring tools using the WEKA Management Station (WMS)
    • WEKA Home - The WEKA support cloud
      • Local WEKA Home overview
      • Deploy Local WEKA Home v3.0 or higher
      • Deploy Local WEKA Home v2.x
      • Explore cluster insights and statistics
      • Manage alerts and integrations
      • Enforce security and compliance
      • Optimize support and data management
    • Set up the WEKAmon external monitoring
    • Set up the SnapTool external snapshots manager
  • Support
    • Get support for your WEKA system
    • Diagnostics management
      • Traces management
        • Manage traces using the GUI
        • Manage traces using the CLI
      • Protocols debug level management
        • Manage protocols debug level using the GUI
        • Manage protocols debug level using the CLI
      • Diagnostics data management
  • Best Practice Guides
    • WEKA and Slurm integration
      • Avoid conflicting CPU allocations
    • Storage expansion best practice
  • WEKApod
    • WEKApod Data Platform Appliance overview
    • WEKApod servers overview
    • Rack installation
    • WEKApod initial system setup and configuration
    • WEKApod support process
  • Appendices
    • WEKA CSI Plugin
      • Deployment
      • Storage class configurations
      • Tailor your storage class configuration with mount options
      • Dynamic and static provisioning
      • Launch an application using WEKA as the POD's storage
      • Add SELinux support
      • NFS transport failback
      • Upgrade legacy persistent volumes for capacity enforcement
      • Troubleshooting
    • Convert cluster to multi-container backend
    • Create a client image
    • Update WMS and WSA
    • BIOS tool
Powered by GitBook
On this page
  • Total SSD net capacity and performance planning
  • SSD resource planning
  • Memory resource planning
  • Backend servers memory requirements
  • Client's memory requirements
  • CPU resource planning
  • CPU allocation strategy
  • Backend servers
  • Clients
  • Network planning
  • Backend servers
  • Clients
  • What to do next?
  1. Planning and Installation
  2. WEKA cluster installation on bare metal servers

Plan the WEKA system hardware requirements

PreviousWEKA cluster installation on bare metal serversNextObtain the WEKA installation packages

Last updated 1 month ago

The planning of a WEKA system is essential before the actual installation process. It involves the planning of the following:

  1. Total SSD net capacity and performance requirements

  2. SSD resources

  3. Memory resources

  4. CPU resources

  5. Network

When implementing an AWS configuration, it is possible to go to the to map capacity and performance requirements into various configurations automatically.

Total SSD net capacity and performance planning

A WEKA system cluster runs on a group of servers with local SSDs. To plan these servers, the following information must be clarified and defined:

  1. Capacity: Plan your net SSD capacity. The data management to object stores can be added after the installation. In the context of the planning stage, only the SSD capacity is required.

  2. Redundancy scheme: Define the optimal redundancy scheme required for the WEKA system, as explained in .

  3. Failure domains: Determine whether to use failure domains (optional), and if yes, determine the number of failure domains and the potential number of servers in each failure domain, as described in , and plan accordingly.

  4. Hot spare: Define the required hot spare count described in .

Once all this data is clarified, you can plan the SSD net storage capacity accordingly, as defined in the . Adhere to the following information, which is required during the installation process:

  1. Cluster size (number of servers).

  2. SSD capacity for each server, for example, 12 servers with a capacity of 6 TB each.

  3. Planned protection scheme, for example, 6+2.

  4. Planned failure domains (optional).

  5. Planned hot spare.

This is an iterative process. Depending on the scenario, some options can be fixed constraints while others are flexible.

SSD resource planning

SSD resource planning involves how the defined capacity is implemented for the SSDs. For each server, the following has to be determined:

  • The number of SSDs and capacity for each SSD (where the multiplication of the two should satisfy the required capacity per server).

  • The selected technology, NVME, SAS, or SATA, and the specific SSD models have implications on SSD endurance and performance.

For on-premises planning, it is possible to consult with the Customer Success Team to map between performance requirements and the recommended WEKA system configuration.

Memory resource planning

Backend servers memory requirements

The total per server memory requirements is the sum of the following requirements:

Purpose
Per-server memory

Fixed

2.8 GB

Frontend processes

2.2 GB x # of Frontend processes

Compute processes

3.9 GB x # of Compute processes

Drive processes

2 GB x # of Drive processes

SSD capacity management

(Total SSD Raw Capacity / Number of Servers / 2,000) + (Number of Cores x 3 GB)

Operating System

The maximum between 8 GB and 2% from the total RAM

Additional protocols (NFS/SMB/S3)

16 GB

RDMA

2 GB

Metadata (pointers)

Dedicated Data Services container

If you intend to add a , it requires additional memory of 5.5 GB.

Contact the Customer Success Team to explore options for configurations requiring more than 384 GB of memory per server.

Example 1: A system with large files

A system with 16 servers with the following details:

  • Number of Frontend processes: 1

  • Number of Compute processes: 13

  • Number of Drive processes: 6

  • Total raw capacity: 983,000 GB

  • Total net capacity: 725,000 GB

  • NFS/SMB services

  • RDMA

  • Average file size: 1 MB (potentially up to 755 million files for all servers; ~47 million files per server)

Calculations:

  • Fixed: 2.8 GB

  • Frontend processes: 1 x 2.2 = 2.2 GB

  • Compute processes: 13 x 3.9 = 50.7 GB

  • Drive processes: 6 x 2 = 12 GB

  • SSD capacity management: 983,000 GB / 16 / 2000 + 20 x 3 GB = ~90.7 GB

  • Additional protocols = 16 GB

  • RDMA = 2 GB

  • Metadata: 20 Bytes x 47 million files x 2 units = ~1.9 GB

Total memory requirement per server= 2.8 + 2.2 + 50.7 + 12 + 90.7 + 16 + 2 + 1.9 = ~178.3 GB

Example 2: A system with small files

For the same system as in example 1, but with smaller files, the required memory for metadata would be larger.

For an average file size of 64 KB, the number of files is potentially up to:

  • ~12 billion files for all servers.

  • ~980 million files per server.

Required memory for metadata: 20 Bytes x 980 million files x 1 unit = ~19.6 GB

Total memory requirement per server = 2.8 + 2.2 + 50.7 + 12 + 90.7 + 16 + 2 + 19.6 = ~196 GB

Client's memory requirements

The WEKA software on a client requires 5 GB minimum additional memory.

CPU resource planning

CPU allocation strategy

The WEKA system implements a Non-Uniform Memory Access (NUMA) aware CPU allocation strategy to maximize the overall performance of the system. The cores allocation uses all NUMAs equally to balance memory usage from all NUMAs.

Consider the following regarding the CPU allocation strategy:

  • The code allocates CPU resources by assigning individual cores to tasks in a cgroup.

  • Cores in a cgroup are not available to run any other user processes.

  • On systems with Intel hyper-threading enabled, the corresponding sibling cores are placed into a cgroup along with the physical ones.

Backend servers

Plan the number of physical cores dedicated to the WEKA software according to the following guidelines and limitations:

  • Dedicate at least one physical core to the operating system; the rest can be allocated to the WEKA software.

    • Generally, it is recommended to allocate as many cores as possible to the WEKA system.

    • A backend server can have as many cores as possible. However, a container within a backend server can have a maximum of 19 physical cores.

    • Leave enough cores for the container serving the protocol if it runs on the same server.

  • Allocate enough cores to support performance targets.

    • Generally, use 1 drive process per SSD for up to 6 SSDs and 1 drive process per 2 SSDs for more, with a ratio of 2 compute processes per drive process.

  • Allocate enough memory to match core allocation, as discussed above.

Clients

On the client side, the WEKA software consumes a single physical core by default. The WEKA software consumes two logical cores if the client is configured with hyper-threading.

If the client networking is defined as UDP, dedicated CPU core resources are not allocated to WEKA. Instead, the operating system allocates CPU resources to the WEKA processes like any other.

Network planning

Backend servers

InfiniBand connections are prioritized over Ethernet links for data traffic. Both network types must be operational to ensure system availability, so consider adding redundant ports for each network type.

Clients can connect to the WEKA system over either InfiniBand or Ethernet.

A network port can be dedicated exclusively to the WEKA system or shared between the WEKA system and other applications.

Clients

Clients can be configured with networking as described above to achieve the highest performance and lowest latency; however, this setup requires compatible hardware and dedicated CPU core resources. If compatible hardware is not available or a dedicated CPU core cannot be allocated to the WEKA system, client networking can instead be configured to use the kernel’s UDP service. This configuration results in reduced performance and increased latency.

What to do next?

Obtain the WEKA installation packages (all paths)

20 Bytes x # Metadata units per server See .

The memory requirements are conservative and can be reduced in some situations, such as in systems with mostly large files or a system with files 4 KB in size. Contact the to receive an estimate for your specific configuration.

For finer tuning, please contact the .

Running other applications on the same server (converged WEKA system deployment) is supported. For details, contact the .

WEKA backend servers support connections to both InfiniBand and Ethernet networks, using (NICs). When deploying backend servers, ensure that all servers in the WEKA system are connected using the same network technology for each type of network.

Self-Service Portal in start.weka.io
Metadata units calculation
Customer Success Team
Customer Success Team
Customer Success Team
Failure Domains
Hot Spare
SSD Capacity Management formula
Selecting a Redundancy Scheme
compatible network interface cards