Manually prepare the system for WEKA configuration

If the system is not prepared using the WMS, perform this procedure to set the networking and other tasks before configuring the WEKA cluster.

Once the hardware and software prerequisites are met, prepare the backend servers and clients for the WEKA system configuration.

This preparation consists of the following steps:

  1. Install NIC drivers

  2. Enable SR-IOV (when required)

  3. Set up ConnectX cards

  4. Set custom kernel parameters

  5. Configure the networking

  6. Configure the HA networking

  7. Verify the network configuration

  8. Configure the clock synchronization

  9. Enable kdump

  10. Disable swap (if any)

  11. Validate the system preparation

Some of the examples contain version-specific information. The software is updated frequently, so the package versions available to you may differ from those presented here.

Related topics

Prerequisites and compatibility

1. Install NIC drivers

For Mellanox OFED setup, see NVIDIA Documentation - Installing Mellanox OFED.

2. Enable SR-IOV

Enabling Single Root I/O Virtualization (SR-IOV) is mandatory when deploying client VMs. The physical NIC must expose its virtual functions (VFs) to the corresponding virtual NICs to ensure proper functionality and performance.

Related topic

Enable the SR-IOV

3. Set up ConnectX cards

  1. Configure firmware parameters: All ConnectX ports used directly with WEKA servers and clients require specific firmware settings for optimal performance. Set the following non-default parameters:

    • ADVANCED_PCI_SETTINGS=1

    • PCI_WR_ORDERING=1

    Use the following command to apply these settings to all MLX devices:

  2. Set link type: Certain ConnectX VPI cards require modification of the link type, to specifically set the port to use InfiniBand or Ethernet networking. If applicable, set the port mode with the following command, where 1=InfiniBand and 2=Ethernet: mlxconfig -y -d /dev/mst/<dev> set LINK_TYPE_P<1,2>=<1,2> For example, the following command sets port 2 to InfiniBand: mlxconfig -y -d /dev/mst/<dev> set LINK_TYPE_P2=1

  3. Reboot the system: A reboot is required after applying the firmware settings to ensure the changes take effect.

Related information

For additional details, refer to the NVIDIA ConnectX documentation.

4. Set custom kernel parameters

To ensure optimal performance and stability, configure the Linux kernel with custom parameters that:

  • Disable NUMA balancing to reduce latency (mandatory).

  • Enable automatic reboots after kernel panic to minimize downtime.

  • Optimize ARP behavior for improved network performance.

The recommended approach is to consolidate all custom kernel parameters into a single configuration file: /etc/sysctl.d/99-weka.conf. This ensures the settings persist across reboots, simplifies administration, and avoids conflicts with package updates.

Procedure

  1. Create the configuration file: Open a new file under /etc/sysctl.d/ to store all custom kernel parameters:

  2. Add kernel parameter settings: Insert the following lines into the file. Comments are included for clarity:

  3. Save the file and exit the editor.

  4. Apply the new settings: Reload all kernel parameters from configuration files without rebooting:

  5. Verify configuration changes:

    1. Verify NUMA balancing:

      Expected output:

    2. Verify kernel panic timer:

      Expected output:

5. Configure the networking

Ethernet configuration

The following example of the ifcfg script is a reference for configuring the Ethernet interface.

MTU 9000 (jumbo frame) is recommended for the best performance. Refer to your switch vendor documentation for jumbo frame configuration.

Bring the interface up using the following command:

InfiniBand configuration

InfiniBand network configuration normally includes Subnet Manager (SM), but the procedure involved is beyond the scope of this document. However, it is important to be aware of the specifics of your SM configuration, such as partitioning and MTU, because they can affect the configuration of the endpoint ports in Linux. For best performance, MTU of 4092 is recommended.

Refer to the following ifcfg script when the IB network only has the default partition, i.e., "no pkey":

Bring the interface up using the following command:

Verify that the “default partition” connection is up, with all the attributes set:

Define the NICs with ignore-carrier

ignore-carrier is a NetworkManager configuration option. When set, it keeps the network interface up even if the physical link is down. It’s useful when services need to bind to the interface address at boot.

The following is an example of configuring ignore-carrier on systems that use NetworkManager on Rocky Linux 8. The exact steps may vary depending on your operating system and its specific network configuration tools. Always refer to your system’s official documentation for accurate information.

  1. Open the /etc/NetworkManager/NetworkManager.conf file to edit it.

  2. Under the [main] section, add one of the following lines depending on the operating system:

    • For some versions of Rocky Linux, RHEL, and CentOS: ignore-carrier=*

    • For some other versions: ignore-carrier=<device-name1>,<device-name2>. Replace <device-name1>,<device-name2> with the actual device names you want to apply this setting to.

Example for RockyLinux and RHEL 8.7:

Example for some other versions:

  1. Restart the NetworkManager service for the changes to take effect.

6. Configure dual-network links with policy-based routing

The following steps provide guidance for configuring dual-network links with policy-based routing on Linux systems. Adjust IP addresses and interface names according to your environment.

RHEL/Rocky/CentOS routing configuration using the network scripts

Network scripts are deprecated in RHEL/Rocky 8. For RHEL/Rocky 8 and onwards, use the Network Manager.

  1. Navigate to /etc/sysconfig/network-scripts/.

  2. Create the file /etc/sysconfig/network-scripts/route-mlnx0 with the following content:

  3. Create the file /etc/sysconfig/network-scripts/route-mlnx1 with the following content:

  4. Create the files /etc/sysconfig/network-scripts/rule-mlnx0 and /etc/sysconfig/network-scripts/rule-mlnx1 with the following content:

  5. Open /etc/iproute2/rt_tables and add the following lines:

  6. Save the changes.

RHEL/Rocky 8+ routing configuration using the Network Manager

You can configure routing for your Ethernet or InfiniBand connections using Network Manager command-line interface (nmcli) commands.

Configure ethernet routing

To set up routing for Ethernet connections, use the following nmcli commands. In these commands, the first IP address of the route (10.10.10.0/24) represents the subnet of the network to which the NIC connects. The last address in the routing rule (10.10.10.1 for eth1) is the IP address of the NIC you are configuring.

Configure InfiniBand routing

To set up routing for InfiniBand connections, use the following nmcli commands. The route's first IP address (10.10.10.0/24) signifies the network's subnet for the NIC. The last address in the routing rules (10.10.10.1 for ib0) is the IP address of the NIC you are configuring.

View network configuration

Run the following command. to view the current network configuration, including interfaces, IP addresses, routes, and DNS settings.

The command returns a detailed list of all network interfaces and their status.

Example

Ubuntu Netplan configuration

  1. Open the Netplan configuration file /etc/netplan/01-netcfg.yaml and adjust it:

  2. After adjusting the Netplan configuration file, run the following commands:

SLES/SUSE configuration

  1. Create /etc/sysconfig/network/ifrule-eth2 with:

  2. Create /etc/sysconfig/network/ifrule-eth4 with:

  3. Create /etc/sysconfig/network/scripts/ifup-route.eth2 with:

  4. Create /etc/sysconfig/network/scripts/ifup-route.eth4 with:

  5. Add the weka lines to /etc/iproute2/rt_tables:

  6. Restart the interfaces or reboot the machine:

Related topic

WEKA networking

7. Verify the network configuration

Use a large-size ICMP ping to check the basic TCP/IP connectivity between the interfaces of the servers:

The-M do flag prohibits packet fragmentation, which allows verification of correct MTU configuration between the two endpoints.

-s 8972 is the maximum ICMP packet size that can be transferred with MTU 9000, due to the overhead of ICMP and IP protocols.

All WEKA server interfaces within the same subnet must have connectivity and be able to ping each other.

8. Configure the clock synchronization

The synchronization of time on computers and networks is considered good practice and is vitally important for the stability of the WEKA system. Proper timestamp alignment in packets and logs is very helpful for the efficient and quick resolution of issues.

Configure the clock synchronization software on the backends and clients according to the specific vendor instructions (see your OS documentation), before installing the WEKA software.

9. Enable kdump

Enabling kdump ensures crash diagnostic data is captured (/var/crash).

  1. Install kdump tools (if not exist): sudo yum install kexec-tools crash.

  2. Enable the kdump service: sudo systemctl enable kdump.service.

  3. Open the file located at: /etc/kdump.conf.

  4. Set the crash dump path and size. Example:

10. Disable swap (if any)

WEKA highly recommends that any servers used as backends have no swap configured. This is distribution-dependent but is often a case of commenting out any swap entries in /etc/fstab and rebooting.

11. Validate the system preparation

The wekachecker is a tool that validates the readiness of the servers in the cluster before installing the WEKA software.

The wekachecker performs the following validations:

  • Dataplane IP, jumbo frames, and routing

  • ssh connection to all servers

  • Timesync

  • OS release

  • Sufficient capacity in /opt/weka

  • Available RAM

  • Internet connection availability

  • NTP

  • DNS configuration

  • Firewall rules

  • WEKA required packages

  • OFED required packages

  • Recommended packages

  • HT/AMT is disabled

  • The kernel is supported

  • CPU has a supported AES, and it is enabled

  • Numa balancing is enabled

  • RAM state

  • XFS FS type installed

  • Mellanox OFED is installed

  • IOMMU setting in all servers is consistent, either all enabled or all disabled.

  • rpcbind utility is enabled

  • SquashFS is enabled

  • noexec mount option on /tmp

The wekacheckertool applies to all WEKA versions. From V4.0, the following validations are not relevant, although the tool displays them:

  • OS has SELinux disabled or in permissive mode.

  • Network Manager is disabled.

Procedure

  1. Clone the the tools repository: git clone --depth 1 https://github.com/weka/tools.git

  2. Change directory to tools/install.

  3. From the install directory, run ./wekachecker <hostnames/IPs> Where: The hostnames/IPs is a space-separated list of all the cluster hostnames or IP addresses connected to the high-speed networking. Example: ./wekachecker 10.1.1.11 10.1.1.12 10.1.1.4 10.1.1.5 10.1.1.6 10.1.1.7 10.1.1.8

  4. Review the output. If failures or warnings are reported, investigate them and correct them as necessary. Repeat the validation until no important issues are reported. The wekachecker writes any failures or warnings to the file: test_results.txt.

Once the report has no failures or warnings that must be fixed, you can install the WEKA software.

wekachecker report example

What to do next?

If you can use the WEKA Configurator, go to:

Configure the WEKA cluster using the WEKA Configurator

Otherwise, go to:

Manually configure the WEKA cluster using the resources generator

Last updated