Manually prepare the system for WEKA configuration

If the system is not prepared using the WMS, perform this procedure to set the networking and other tasks before configuring the WEKA cluster.

Once the hardware and software prerequisites are met, prepare the backend servers and clients for the WEKA system configuration.

This preparation consists of the following steps:

Install NIC drivers
Enable SR-IOV (when required)
Set up ConnectX cards
Configure the networking
Configure the HA networking
Verify the network configuration
Configure the clock synchronization
Disable the NUMA balancing
Enable kdump and set kernel panic reboot timer
Disable swap (if any)
Validate the system preparation

Some of the examples contain version-specific information. The software is updated frequently, so the package versions available to you may differ from those presented here.

Related topics

Prerequisites and compatibility

1. Install NIC drivers

To install Mellanox OFED, see NVIDIA Documentation - Installing Mellanox OFED.
To install Broadcom driver, see Broadcom adapter setup for WEKA system.
To install Intel driver, see Latest Drivers & Software downloads.

2. Enable SR-IOV

Single Root I/O Virtualization (SR-IOV) enablement is mandatory in the following cases:

The servers are equipped with Intel NICs.
When working with client VMs, a physical NIC's virtual functions (VFs) must be exposed to the virtual NICs.

Related topic

Enable the SR-IOV

3. Set up ConnectX cards

Configure firmware parameters: All ConnectX ports used directly with WEKA servers and clients require specific firmware settings for optimal performance. Set the following non-default parameters:
- ADVANCED_PCI_SETTINGS=1
- PCI_WR_ORDERING=1
Use the following command to apply these settings to all MLX devices:
```
mst start && for MLXDEV in /dev/mst/* ; do mlxconfig -d ${MLXDEV} -y set ADVANCED_PCI_SETTINGS=1 PCI_WR_ORDERING=1; done
```
Set link type: Certain ConnectX VPI cards require modification of the link type, to specifically set the port to use InfiniBand or Ethernet networking. If applicable, set the port mode with the following command, where 1=InfiniBand and 2=Ethernet: mlxconfig -y -d /dev/mst/<dev> set LINK_TYPE_P<1,2>=<1,2>For example, the following command sets port 2 to InfiniBand:mlxconfig -y -d /dev/mst/<dev> set LINK_TYPE_P2=1
Reboot the system: A reboot is required after applying the firmware settings to ensure the changes take effect.

Related information

For additional details, refer to the NVIDIA ConnectX documentation.

4. Configure the networking

Ethernet configuration

The following example of the ifcfg script is a reference for configuring the Ethernet interface.

/etc/sysconfig/network-scripts/ifcfg-enp24s0

TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="none"
DEFROUTE="no"
IPV4_FAILURE_FATAL="no"
IPV6INIT="no"
IPV6_AUTOCONF="no"
IPV6_DEFROUTE="no"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="enp24s0"
DEVICE="enp24s0"
ONBOOT="yes"
NM_CONTROLLED=no
IPADDR=192.168.1.1
NETMASK=255.255.0.0
MTU=9000

MTU 9000 (jumbo frame) is recommended for the best performance. Refer to your switch vendor documentation for jumbo frame configuration.

Bring the interface up using the following command:

# ifup enp24s0

InfiniBand configuration

InfiniBand network configuration normally includes Subnet Manager (SM), but the procedure involved is beyond the scope of this document. However, it is important to be aware of the specifics of your SM configuration, such as partitioning and MTU, because they can affect the configuration of the endpoint ports in Linux. For best performance, MTU of 4092 is recommended.

Refer to the following ifcfg script when the IB network only has the default partition, i.e., "no pkey":

/etc/sysconfig/network-scripts/ifcfg-ib1

TYPE=Infiniband
ONBOOT=yes
BOOTPROTO=static
STARTMODE=auto
USERCTL=no
NM_CONTROLLED=no
DEVICE=ib1
IPADDR=192.168.1.1
NETMASK=255.255.0.0
MTU=4092

Bring the interface up using the following command:

# ifup ib1

Verify that the “default partition” connection is up, with all the attributes set:

# ip a s ib1
4: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP group default qlen 256
  link/infiniband 00:00:03:72:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a8:09:48
brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet 10.0.20.84/24 brd 10.0.20.255 scope global noprefixroute ib0
       valid_lft forever preferred_lft forever

On an InfiniBand network with a non-default partition number, p-key must be configured on the interface if the InfiniBand ports on your network are members of an InfiniBand partition other than the default (0x7FFF). The p-key should associate the port as a full member of the partition (full members are those where the p-key number with the most-significant bit (MSB) of the 16-bits is set to 1).

Example: If the partition number is 0x2, the limited member p-key will equal the p-key itself, i.e.,0x2. The full member p-key will be calculated as the logical OR of 0x8000 and the p-key (0x2) and therefore will be equal to 0x8002.

Note: All InfiniBand ports communicating with the Weka cluster must be full members.

For each pkey-ed IPoIB interface, it's necessary to create two ifcfg scripts. To configure your own pkey-ed IPoIB interface, refer to the following examples, where a pkey of 0x8002 is used. You may need to manually create the child device.

/etc/sysconfig/network-scripts/ifcfg-ib1

TYPE=Infiniband
ONBOOT=yes
MTU=4092
BOOTPROTO=static
STARTMODE=auto
USERCTL=no
NM_CONTROLLED=no
DEVICE=ib1

/etc/sysconfig/network-scripts/ifcfg-ib1.8002

TYPE=Infiniband
BOOTPROTO=none
CONNECTED_MODE=yes
DEVICE=ib1.8002
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
MTU=4092
NAME=ib1.8002
NM_CONTROLLED=no
ONBOOT=yes
PHYSDEV=ib1
PKEY_ID=2
PKEY=yes
BROADCAST=192.168.255.255
NETMASK=255.255.0.0
IPADDR=192.168.1.1

Bring the interface up using the following command:

# ifup ib1.8002

Verify the connection is up with all the non-default partition attributes set:

# ip a s ib1.8002
5: ib1.8002@ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP qlen 256
    link/infiniband 00:00:11:03:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a8:09:48 brd 00:ff:ff:ff:ff:12:40:1b:80:02:00:00:00:00:00:00:ff:ff:ff:ff
    inet 192.168.1.1/16 brd 192.168.255.255 scope global noprefixroute ib1.8002
       valid_lft forever preferred_lft forever

Define the NICs with `ignore-carrier`

ignore-carrier is a NetworkManager configuration option. When set, it keeps the network interface up even if the physical link is down. It’s useful when services need to bind to the interface address at boot.

The following is an example of configuring ignore-carrier on systems that use NetworkManager on Rocky Linux 8. The exact steps may vary depending on your operating system and its specific network configuration tools. Always refer to your system’s official documentation for accurate information.

Open the /etc/NetworkManager/NetworkManager.conf file to edit it.
Under the [main] section, add one of the following lines depending on the operating system:
- For some versions of Rocky Linux, RHEL, and CentOS: ignore-carrier=*
- For some other versions: ignore-carrier=<device-name1>,<device-name2>. Replace <device-name1>,<device-name2> with the actual device names you want to apply this setting to.

Example for RockyLinux and RHEL 8.7:

/etc/NetworkManager/NetworkManager.conf

[main]
ignore-carrier=*

Example for some other versions:

[main]
ignore-carrier=ib0,ib1

Restart the NetworkManager service for the changes to take effect.

5. Configure dual-network links with policy-based routing

The following steps provide guidance for configuring dual-network links with policy-based routing on Linux systems. Adjust IP addresses and interface names according to your environment.

General Settings in `/etc/sysctl.conf`

Open the /etc/sysctl.conf file using a text editor.

Add the following lines at the end of the file to set minimal configurations per InfiniBand (IB) or Ethernet (Eth) interface:

# Minimal configuration, set per IB/Eth interface
net.ipv4.conf.ib0.arp_announce = 2
net.ipv4.conf.ib1.arp_announce = 2
net.ipv4.conf.ib0.arp_filter = 1
net.ipv4.conf.ib1.arp_filter = 1
net.ipv4.conf.ib0.arp_ignore = 0
net.ipv4.conf.ib1.arp_ignore = 0

# As an alternative set for all interfaces by default
net.ipv4.conf.all.arp_filter = 1
net.ipv4.conf.default.arp_filter = 1
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.default.arp_announce = 2
net.ipv4.conf.all.arp_ignore = 0
net.ipv4.conf.default.arp_ignore = 0

Save the file.
Apply the new settings by running:
```
sysctl -p /etc/sysctl.conf
```

RHEL/Rocky/CentOS routing configuration using the network scripts

Network scripts are deprecated in RHEL/Rocky 8. For RHEL/Rocky 8 and onwards, use the Network Manager.

Navigate to /etc/sysconfig/network-scripts/.

Create the file /etc/sysconfig/network-scripts/route-mlnx0 with the following content:

10.90.0.0/16 dev mlnx0 src 10.90.0.1 table weka1
default via 10.90.2.1 dev mlnx0 table weka1

Create the file /etc/sysconfig/network-scripts/route-mlnx1 with the following content:

10.90.0.0/16 dev mlnx1 src 10.90.1.1 table weka2
default via 10.90.2.1 dev mlnx1 table weka2

Create the files /etc/sysconfig/network-scripts/rule-mlnx0 and /etc/sysconfig/network-scripts/rule-mlnx1 with the following content:
```
table weka1 from 10.90.0.1
table weka2 from 10.90.1.1
```
Open /etc/iproute2/rt_tables and add the following lines:
```
100 weka1
101 weka2
```
Save the changes.

RHEL/Rocky 8+ routing configuration using the Network Manager

You can configure routing for your Ethernet or InfiniBand connections using Network Manager command-line interface (nmcli) commands.

Configure ethernet routing

To set up routing for Ethernet connections, use the following nmcli commands. In these commands, the first IP address of the route (10.10.10.0/24) represents the subnet of the network to which the NIC connects. The last address in the routing rule (10.10.10.1 for eth1) is the IP address of the NIC you are configuring.

nmcli connection modify eth1 ipv4.routes "10.10.10.0/24 src=10.10.10.1 table=100" ipv4.routing-rules "priority 101 from 10.10.10.1 table 100"
nmcli connection modify eth2 ipv4.routes "10.10.10.0/24 src=10.10.10.101 table=200" ipv4.routing-rules "priority 102 from 10.10.10.101 table 200"

Configure InfiniBand routing

To set up routing for InfiniBand connections, use the following nmcli commands. The route's first IP address (10.10.10.0/24) signifies the network's subnet for the NIC. The last address in the routing rules (10.10.10.1 for ib0) is the IP address of the NIC you are configuring.

nmcli connection modify ib0 ipv4.route-metric 100
nmcli connection modify ib1 ipv4.route-metric 101

nmcli connection modify ib0 ipv4.routes "10.10.10.0/24 src=10.10.10.1 table=100" 
nmcli connection modify ib0 ipv4.routing-rules "priority 101 from 10.10.10.1 table 100"
nmcli connection modify ib1 ipv4.routes "10.10.10.0/24 src=10.10.10.101 table=200" 
nmcli connection modify ib1 ipv4.routing-rules "priority 102 from 10.10.10.101 table 200"

View network configuration

Run the following command. to view the current network configuration, including interfaces, IP addresses, routes, and DNS settings.

nmcli -

The command returns a detailed list of all network interfaces and their status.

Example

eno12409: connected to eno12409
        "Mellanox MT2894"
        ethernet (mlx5_core), 50:00:E6:42:FC:27, hw, mtu 9000
        ip4 default
        inet4 10.10.35.140/25
        route4 10.10.35.128/25 metric 101
        route4 default via 10.10.35.129 metric 101
        inet6 fe80::207c:c202:d22f:d26b/64
        route6 fe80::/64 metric 1024

ens1: connected to ens1
        "Mellanox MT2910"
        ethernet (mlx5_core), 9C:63:C0:EB:7C:02, hw, mtu 9000
        inet4 10.10.50.1/24
        route4 10.10.50.0/24 metric 0
        route4 10.10.30.0/24 metric 0
        route4 10.10.37.0/25 via 10.10.50.1 metric 0
        inet6 fe80::f0d6:8ea:5be4:f4ed/64
        route6 fe80::/64 metric 1024

DNS configuration:
        servers: 10.219.59.120 10.211.188.61 10.211.188.73
        domains: coupang.net
        interface: eno12409

Ubuntu Netplan configuration

Open the Netplan configuration file /etc/netplan/01-netcfg.yaml and adjust it:

network:
    version: 2
    renderer: networkd
    ethernets:
        enp2s0:
            dhcp4: true
            nameservers:
                    addresses: [8.8.8.8]
        ib1:
            addresses:
                    [10.222.0.10/24]
            routes:
                    - to: 10.222.0.0/24
                      via: 10.222.0.10
                      table: 100
            routing-policy:
                    - from: 10.222.0.10
                      table: 100
                      priority: 32764
            ignore-carrier: true
            
        ib2:
            addresses:
                    [10.222.0.20/24]
            routes:
                    - to: 10.222.0.0/24
                      via: 10.222.0.20
                      table: 101
            routing-policy:
                    - from: 10.222.0.20
                      table: 101
                      priority: 32765
            ignore-carrier: true

After adjusting the Netplan configuration file, run the following commands:

ip route add 10.222.0.0/24 via 10.222.0.10 dev ib1 table 100
ip route add 10.222.0.0/24 via 10.222.0.20 dev ib2 table 101

SLES/SUSE configuration

Create /etc/sysconfig/network/ifrule-eth2 with:
```
ipv4 from 192.168.11.21 table 100
```
Create /etc/sysconfig/network/ifrule-eth4 with:
```
ipv4 from 192.168.11.31 table 101
```

Create /etc/sysconfig/network/scripts/ifup-route.eth2 with:

ip route add 192.168.11.0/24 dev eth2 src 192.168.11.21 table weka1

Create /etc/sysconfig/network/scripts/ifup-route.eth4 with:

ip route add 192.168.11.0/24 dev eth4 src 192.168.11.31 table weka2

Add the weka lines to /etc/iproute2/rt_tables:
```
100 weka1
101 weka2
```
Restart the interfaces or reboot the machine:
```
ifdown eth2; ifdown eth4; ifup eth2; ifup eth4
```

Related topic

WEKA networking

6. Verify the network configuration

Use a large-size ICMP ping to check the basic TCP/IP connectivity between the interfaces of the servers:

# ping -M do -s 8972 -c 3 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 8972(9000) bytes of data.
8980 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.063 ms
8980 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.087 ms
8980 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=0.075 ms

--- 192.168.2.0 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.063/0.075/0.087/0.009 ms

The-M do flag prohibits packet fragmentation, which allows verification of correct MTU configuration between the two endpoints.

-s 8972 is the maximum ICMP packet size that can be transferred with MTU 9000, due to the overhead of ICMP and IP protocols.

All WEKA server interfaces within the same subnet must have connectivity and be able to ping each other.

7. Configure the clock synchronization

The synchronization of time on computers and networks is considered good practice and is vitally important for the stability of the WEKA system. Proper timestamp alignment in packets and logs is very helpful for the efficient and quick resolution of issues.

Configure the clock synchronization software on the backends and clients according to the specific vendor instructions (see your OS documentation), before installing the WEKA software.

8. Disable persistent NUMA balancing

The WEKA system autonomously manages NUMA balancing to make optimal decisions. Disabling the Linux kernel’s NUMA balancing feature is a mandatory requirement to prevent adding latencies to operations. It is crucial that NUMA balancing remains disabled and is not altered by a server reboot.

This procedure modifies the sysctl.conf file to ensure the setting persists across server reboots.

Procedure

Open the /etc/sysctl.conf file using a text editor, such as vi or nano, with root privileges.
```
sudo vi /etc/sysctl.conf
```
Add the following line to the file:
```
kernel.numa_balancing = 0
```
Save your changes and exit the editor.
Apply the setting immediately without rebooting the server.
```
sudo sysctl -p 
```
The command's output confirms the change:
```
kernel.numa_balancing = 0
```

9. Enable kdump and set kernel panic reboot timer

Enabling kdump and configuring the kernel panic reboot timer ensures system crashes leave log files for analysis and automate system reboot after a kernel panic to minimize downtime.

Enable kdump

Enabling kdump ensures crash diagnostic data is captured (/var/crash).

Install kdump tools (if not exist): sudo yum install kexec-tools crash.
Enable the kdump service: sudo systemctl enable kdump.service.
Open the file located at: /etc/kdump.conf.
Set the crash dump path and size. Example:

path /var/crash
core_collector makedumpfile -c --message-level 1 -d 31

Set kernel panic reboot timer

Setting kernel.panic to reboot after 300 seconds automates recovery from kernel panics, reducing server downtime and aiding in faster issue resolution.

Open the file located at: /etc/sysctl.conf
Append the following line: kernel.panic = 300
Apply changes: sudo sysctl -p

10. Disable swap (if any)

WEKA highly recommends that any servers used as backends have no swap configured. This is distribution-dependent but is often a case of commenting out any swap entries in /etc/fstab and rebooting.

11. Validate the system preparation

The wekachecker is a tool that validates the readiness of the servers in the cluster before installing the WEKA software.

The wekachecker performs the following validations:

Dataplane IP, jumbo frames, and routing
ssh connection to all servers
Timesync
OS release
Sufficient capacity in /opt/weka
Available RAM
Internet connection availability
NTP
DNS configuration
Firewall rules
WEKA required packages
OFED required packages
Recommended packages
HT/AMT is disabled
The kernel is supported
CPU has a supported AES, and it is enabled
Numa balancing is enabled
RAM state
XFS FS type installed
Mellanox OFED is installed
IOMMU setting in all servers is consistent, either all enabled or all disabled.
rpcbind utility is enabled
SquashFS is enabled
noexec mount option on /tmp

The wekacheckertool applies to all WEKA versions. From V4.0, the following validations are not relevant, although the tool displays them:

OS has SELinux disabled or in permissive mode.
Network Manager is disabled.

Procedure

Download the wekachecker tarball from https://github.com/weka/tools/blob/master/install/wekachecker and extract it.
From the install directory, run ./wekachecker <hostnames/IPs> Where: The hostnames/IPs is a space-separated list of all the cluster hostnames or IP addresses connected to the high-speed networking. Example: ./wekachecker 10.1.1.11 10.1.1.12 10.1.1.4 10.1.1.5 10.1.1.6 10.1.1.7 10.1.1.8
Review the output. If failures or warnings are reported, investigate them and correct them as necessary. Repeat the validation until no important issues are reported. The wekachecker writes any failures or warnings to the file: test_results.txt.

Once the report has no failures or warnings that must be fixed, you can install the WEKA software.

wekachecker report example

Dataplane IP Jumbo Frames/Routing test                       [PASS]
Check ssh to all hosts                                       [PASS]
Verify timesync                                              [PASS]
Check if OS has SELinux disabled or in permissive mode       [PASS]
Check OS Release...                                          [PASS]
Check /opt/weka for sufficient capacity...                   [WARN]
Check available RAM...                                       [PASS]
Check if internet connection available...                    [PASS]
Check for NTP...                                             [PASS]
Check DNS configuration...                                   [PASS]
Check Firewall rules...                                      [PASS]
Check for WEKA Required Packages...                          [PASS]
Check for OFED Required Packages...                          [PASS]
Check for Recommended Packages...                            [WARN]
Check if HT/AMT is disabled                                  [WARN]
Check if kernel is supported...                              [PASS]
Check if CPU has AES enabled and supported                   [PASS]
Check if Network Manager is disabled                         [WARN]
Checking if Numa balancing is enabled                        [WARN]
Checking RAM state for errors                                [PASS]
Check for XFS FS type installed                              [PASS]
Check if Mellanox OFED is installed                          [PASS]
Check for consistent IOMMU                                   [PASS]
Check for rpcbind enabled                                    [PASS]
Check for squashfs enabled                                   [PASS]
Check for /tmp noexec mount                                  [PASS]

RESULTS: 21 Tests Passed, 0 Failed, 5 Warnings

What to do next?

If you can use the WEKA Configurator, go to:

Configure the WEKA cluster using the WEKA Configurator

Otherwise, go to:

Manually configure the WEKA cluster using the resources generator

PreviousManually install OS and WEKA on servers NextBroadcom adapter setup for WEKA system

Last updated 12 days ago

1. Install NIC drivers

2. Enable SR-IOV

3. Set up ConnectX cards

4. Configure the networking

Ethernet configuration

InfiniBand configuration

Define the NICs with ignore-carrier

5. Configure dual-network links with policy-based routing

General Settings in /etc/sysctl.conf

RHEL/Rocky/CentOS routing configuration using the network scripts

RHEL/Rocky 8+ routing configuration using the Network Manager

Ubuntu Netplan configuration

SLES/SUSE configuration

6. Verify the network configuration

7. Configure the clock synchronization

8. Disable persistent NUMA balancing

9. Enable kdump and set kernel panic reboot timer

10. Disable swap (if any)

11. Validate the system preparation

What to do next?

Define the NICs with `ignore-carrier`

General Settings in `/etc/sysctl.conf`