Avoid conflicting CPU allocations
In a WEKA and Slurm integration, efficient CPU allocation is crucial to prevent conflicts between the WEKA filesystem and Slurm job scheduling. Improper CPU allocation can lead to performance degradation, CPU starvation, or resource contention. This section outlines best practices to ensure WEKA and Slurm coexist harmoniously by carefully managing CPUsets and NUMA node allocations.
1. Disable WEKA CPUset isolation
Ensure that WEKA's default CPUset isolation is disabled to avoid conflicts with Slurm.
[root@example01 ~]# grep 'isolate_cpusets=' /etc/wekaio/service.conf
isolate_cpusets = false
2. Verify hyperthreading and NUMA configuration
Verify your system's hyperthreading and NUMA configuration. Typically, hyperthreading is disabled in most Slurm-managed environments. In this example, hyperthreading is disabled, and there are four NUMA nodes.
[root@example01 ~]# lscpu | egrep 'Thread|NUMA'
Thread(s) per core: 1
NUMA node(s): 4
NUMA node0 CPU(s): 0-13
NUMA node1 CPU(s): 14-27
NUMA node2 CPU(s): 28-41
NUMA node3 CPU(s): 42-55
3. Identify the NUMA node of the dataplane network interface
Determine the NUMA node associated with the dataplane network interface. For instance, ib0
is located in NUMA node 1.
[root@example01 ~]# cat /sys/class/net/ib0/device/numa_node
1
[root@example01 ~]# cat /sys/class/net/ib0/device/local_cpulist
14-27
4. Assign CPU cores to WEKA
When mounting the WEKA filesystem, specify the CPU cores for the WEKA client to use. These cores should be in the same NUMA node as the network interface.
Avoid using core 0. Typically, the last cores in the NUMA node are chosen. For example:
[root@example01 ~]# mount -t wekafs -o core=24,core=25,core=26,core=27,net=ib0 /mnt/wekafs
After mounting, confirm the cores and network interfaces used by WEKA:
[root@example01 ~]# weka local resources | head
ROLES NODE ID CORE ID
MANAGEMENT 0 <auto>
FRONTEND 1 24
FRONTEND 2 25
FRONTEND 3 26
FRONTEND 4 27
NET DEVICE IDENTIFIER DEFAULT GATEWAY IPS NETMASK NETWORK LABEL
ib0 0000:4b:00.0 19
5. Configure Slurm to exclude WEKA's cores
Configure Slurm to exclude WEKA's cores from those available for user jobs by setting the CPUSpecList
parameter.
Verify the configuration with:
[root@example01 ~]# scontrol show node $(hostname -s) | grep CPUSpecList
CoreSpecCount=4 CPUSpecList=24-27 MemSpecLimit=20480
6. Verify CPUset configuration
Ensure that the Slurm CPUset excludes the cores assigned to the WEKA client.
[root@example01 ~]# grep "" /sys/fs/cgroup/cpuset/weka-client/*cpus
/sys/fs/cgroup/cpuset/weka-client/cpuset.cpus:24-27
/sys/fs/cgroup/cpuset/weka-client/cpuset.effective_cpus:24-27
[root@example01 ~]# grep "" /sys/fs/cgroup/cpuset/slurm/system/*cpus
/sys/fs/cgroup/cpuset/slurm/system/cpuset.cpus:0-23,28-55
/sys/fs/cgroup/cpuset/slurm/system/cpuset.effective_cpus:0-23,28-55
7. Manage hyperthreading
If hyperthreading is enabled, identify the sibling CPUs and include them in both the WEKA mount options and Slurm’s CPUSpecList
. For clarity, even though WEKA automatically reserves these CPUs, explicitly specifying them can help avoid potential issues.
In this example, hyperthreading is disabled, so no additional CPUs are required:
[root@example01 ~]# grep "" /sys/devices/system/cpu/*/topology/thread_siblings_list | egrep 'cpu24|cpu25|cpu26|cpu27'
/sys/devices/system/cpu/cpu24/topology/thread_siblings_list:24
/sys/devices/system/cpu/cpu25/topology/thread_siblings_list:25
/sys/devices/system/cpu/cpu26/topology/thread_siblings_list:26
/sys/devices/system/cpu/cpu27/topology/thread_siblings_list:27
8. Address logical and physical CPU index mismatch
In certain situations, environmental factors like BIOS or hypervisor settings may cause discrepancies between logical CPU numbers and the physical or OS-assigned numbers. This can result in the Slurm CPUset mistakenly including CPUs that should be reserved for the WEKA client, potentially leading to resource conflicts such as CPU starvation.
For example, if the CPUset configuration shows that Slurm is not correctly excluding the WEKA-assigned CPUs, you might see something like this, where CPUs 56, 58, 60, and 62 are listed in both CPUsets, which will cause conflicts:
[root@example01 ~]# grep "" /sys/fs/cgroup/cpuset/weka*/cpuset.effective_cpus
56-63
[root@example01 ~]# grep "" /sys/fs/cgroup/cpuset/slurm/system/cpuset.effective_cpus
0-48,50,52,54,56,58,60,62
The issue may arise from non-sequential CPU numbering, where CPUs are interleaved between NUMA nodes:
[root@example01 ~]# lscpu | egrep 'Thread|NUMA'
Thread(s) per core: 1
NUMA node(s): 2
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63
To address this, do the following:
Ensure that the WEKA agent's
isolate_cpuset=false
setting is applied (see Step 1), and that the agent has been restarted.Use
hwloc-ls
to map the logical index (L#) to the physical/OS index (P#) for the CPUs assigned to WEKA. If the logical and physical indexes don’t match, use the logical index numbers in Slurm’sCPUSpecList
parameter.
In this example, the output indicates a mismatch between the L# and P#:
[root@example01 ~]# weka local resources | egrep 'FRONTEND' | awk '{print "hwloc-ls | grep P\\#"$3}' | bash
L2 L#28 (2048KB) + L1d L#28 (48KB) + L1i L#28 (32KB) + Core L#28 + PU L#28 (P#56)
L2 L#29 (2048KB) + L1d L#29 (48KB) + L1i L#29 (32KB) + Core L#29 + PU L#29 (P#58)
L2 L#30 (2048KB) + L1d L#30 (48KB) + L1i L#30 (32KB) + Core L#30 + PU L#30 (P#60)
L2 L#31 (2048KB) + L1d L#31 (48KB) + L1i L#31 (32KB) + Core L#31 + PU L#31 (P#62)
L2 L#60 (2048KB) + L1d L#60 (48KB) + L1i L#60 (32KB) + Core L#60 + PU L#60 (P#57)
L2 L#61 (2048KB) + L1d L#61 (48KB) + L1i L#61 (32KB) + Core L#61 + PU L#61 (P#59)
L2 L#62 (2048KB) + L1d L#62 (48KB) + L1i L#62 (32KB) + Core L#62 + PU L#62 (P#61)
L2 L#63 (2048KB) + L1d L#63 (48KB) + L1i L#63 (32KB) + Core L#63 + PU L#63 (P#63)
Although WEKA uses physical cores 56-63
, set Slurm’s CPUSpecList
to 28-31,60-63
to correctly allocate the CPUs based on their logical index.
Related information
Slurm GRES documentation (for more details on logical and physical core index mapping)
Last updated