Plan the WEKA system hardware requirements

The planning of a WEKA system is essential before the actual installation process. It involves the planning of the following:

  1. Total SSD net capacity and performance requirements

  2. SSD resources

  3. Memory resources

  4. CPU resources

  5. Network

When implementing an AWS configuration, it is possible to go to the Self-Service Portal in start.weka.io to map capacity and performance requirements into various configurations automatically.

Total SSD net capacity and performance planning

A WEKA system cluster runs on a group of servers with local SSDs. To plan these servers, the following information must be clarified and defined:

  1. Capacity: Plan your net SSD capacity. The data management to object stores can be added after the installation. In the context of the planning stage, only the SSD capacity is required.

  2. Redundancy scheme: Define the optimal redundancy scheme required for the WEKA system, as explained in Selecting a Redundancy Scheme.

  3. Failure domains: Determine whether to use failure domains (optional), and if yes, determine the number of failure domains and the potential number of servers in each failure domain, as described in Failure Domains, and plan accordingly.

  4. Hot spare: Define the required hot spare count (see Cluster capacity and redundancy management #Hot spare capacity).

Once all this data is clarified, you can plan the SSD net storage capacity accordingly (see Cluster capacity and redundancy management #SSD net storage capacity calculation). Adhere to the following information, which is required during the installation process:

  1. Cluster size (number of servers).

  2. SSD capacity for each server, for example, 12 servers with a capacity of 6 TB each.

  3. Planned protection scheme, for example, 6+2.

  4. Planned failure domains (optional).

  5. Planned hot spare.

This is an iterative process. Depending on the scenario, some options can be fixed constraints while others are flexible.

SSD resource planning

SSD resource planning involves how the defined capacity is implemented for the SSDs. For each server, the following has to be determined:

  • The number of SSDs and capacity for each SSD (where the multiplication of the two should satisfy the required capacity per server).

  • The selected technology, NVME, SAS, or SATA, and the specific SSD models have implications on SSD endurance and performance.

For on-premises planning, it is possible to consult with the Customer Success Team to map between performance requirements and the recommended WEKA system configuration.

Memory resource planning

Backend servers memory requirements

The total per server memory requirements is the sum of the following requirements:

Purpose
Per-server memory

Fixed

2.8 GB

Frontend processes

2.2 GB x # of Frontend processes

Compute processes

3.9 GB x # of Compute processes

Drive processes

2 GB x # of Drive processes

SSD capacity management

(Total SSD Raw Capacity / Number of Servers / 2,000) + (Number of Cores x 3 GB)

Operating System

The maximum between 8 GB and 2% from the total RAM

Additional protocols (NFS/SMB/S3)

16 GB

RDMA

2 GB

Metadata (pointers)

20 Bytes x # Metadata units per server See Metadata units calculation.

Dedicated Data Services container

If you intend to add a , it requires additional memory:

  • 3.5 GB (without dedicated core)

  • 5.5 GB (with dedicated core)

Example 1: A system with large files

A system with 16 servers with the following details:

  • Fixed: 2.8 GB Number of Frontend processes: 1

  • Number of Compute processes: 13

  • Number of Drive processes: 6

  • Total raw capacity: 983 TB (983,000 GB)

  • Total net capacity: 725 TB

  • NFS/SMB services

  • RDMA

  • Average file size: 1 MB (potentially up to 755 million files for all servers; ~47 million files per server)

Calculations:

  • Frontend processes: 1 x 2.2 = 2.2 GB

  • Compute processes: 13 x 3.9 = 50.7 GB

  • Drive processes: 6 x 2 = 12 GB

  • SSD capacity management: 983,000 GB / 16 / 2000 + 20 x 3 GB = ~90.7 GB

  • Additional protocols = 16 GB

  • RDMA = 2 GB

  • Metadata: 20 Bytes x 47 million files x 2 units = ~1.9 GB

Total memory requirement per server = 2.8 + 2.2 + 50.7 + 12 + 90.7 + 16 + 2 + 1.9 = ~178.3 GB

Example 2: A system with small files

For the same system as in example 1, but with smaller files, the required memory for metadata would be larger.

For an average file size of 64 KB, the number of files is potentially up to:

  • ~12 billion files for all servers.

  • ~980 million files per server.

Required memory for metadata: 20 Bytes x 980 million files x 1 unit = ~19.6 GB

Total memory requirement per server = 2.8 + 2.2 + 50.7 + 12 + 90.7 + 16 + 2 + 19.6 = ~196 GB

The memory requirements are conservative and can be reduced in some situations, such as in systems with mostly large files or a system with files 4 KB in size. Contact the Customer Success Team to receive an estimate for your specific configuration.

Client's memory requirements

The WEKA software on a client requires 5 GB minimum additional memory.

CPU resource planning

CPU allocation strategy

The WEKA system implements a Non-Uniform Memory Access (NUMA) aware CPU allocation strategy to maximize the overall performance of the system. The cores allocation uses all NUMAs equally to balance memory usage from all NUMAs.

Consider the following regarding the CPU allocation strategy:

  • The code allocates CPU resources by assigning individual cores to tasks in a cgroup.

  • Cores in a cgroup are not available to run any other user processes.

  • On systems with Intel hyper-threading enabled, the corresponding sibling cores are placed into a cgroup along with the physical ones.

Backend CPU usage

Plan the number of physical cores dedicated to the WEKA software according to the following guidelines and limitations:

  • Dedicate at least one physical core to the operating system; the rest can be allocated to the WEKA software.

    • Generally, it is recommended to allocate as many cores as possible to the WEKA system.

    • A backend server can have as many cores as possible. However, a container within a backend server can have a maximum of 19 physical cores.

    • Leave enough cores for the container serving the protocol if it runs on the same server.

  • Allocate enough cores to support performance targets.

    • Generally, use 1 drive process per SSD for up to 6 SSDs and 1 drive process per 2 SSDs for more, with a ratio of 2 compute processes per drive process.

    • For finer tuning, please contact the Customer Success Team.

  • Allocate enough memory to match core allocation, as discussed above.

  • Running other applications on the same server (converged WEKA system deployment) is supported. For details, contact the Customer Success Team.

Client CPU usage

The WEKA client software requires one physical CPU core by default. When running on systems with hyper-threading enabled, WEKA consumes two logical cores.

In UDP networking, the operating system pins WEKA processes to specific CPU cores. These processes maintain guaranteed access to their assigned cores, but the operating system can still schedule other processes to run on the same cores. This contrasts with exclusive CPU allocation, where WEKA reserves cores solely for its processes.

Network planning

Backend servers

WEKA backend servers support connections to both InfiniBand and Ethernet networks, using compatible network interface cards (NICs). When deploying backend servers, ensure that all servers in the WEKA system are connected using the same network technology for each type of network.

InfiniBand connections are prioritized over Ethernet links for data traffic. Both network types must be operational to ensure system availability, so consider adding redundant ports for each network type.

Clients can connect to the WEKA system over either InfiniBand or Ethernet.

A network port can be dedicated exclusively to the WEKA system or shared between the WEKA system and other applications.

Clients

Clients can be configured with networking as described above to achieve the highest performance and lowest latency; however, this setup requires compatible hardware and dedicated CPU core resources. If compatible hardware is not available or a dedicated CPU core cannot be allocated to the WEKA system, client networking can instead be configured to use the kernel’s UDP service. This configuration results in reduced performance and increased latency.

What to do next?

Obtain the WEKA installation packages (all paths)

Last updated