# LWH stats: sizing and performance optimization

## Stats workload principles

Statistical data volume in large clusters can exceed default configurations. When stats workers fail to process incoming data quickly, the stats stream reaches its 3 GiB capacity, causing NATS to reject new messages.

Effective sizing requires a clear understanding of the primary components:

* **`api.stats`:** Manages the ingestion and exposure of statistical data.
* **`workers.stats`:** Performs heavy processing of statistics. This component is typically the primary bottleneck in large environments.
* **`workers.forwarding`:** Handles the transmission of processed data. These processes require fewer CPU resources but still scale with the cluster size.

Load scales linearly based on the number of unique (`host_id`, `node_id`) metric pairs.

### `workers.stats` capacity references

Use these values to determine the necessary CPU resources for a cluster.

<table><thead><tr><th width="191.72723388671875">Metric</th><th width="279.272705078125">Theoretical maximum</th><th>Recommended safe value</th></tr></thead><tbody><tr><td>Pairs per 1 CPU core</td><td>750</td><td>550</td></tr><tr><td>Target utilization</td><td>100%</td><td>70%</td></tr></tbody></table>

### **Sizing by cluster scale**

<table><thead><tr><th width="134.27276611328125">Cluster size</th><th width="167.6363525390625">Unique pairs</th><th width="191.181884765625">Estimated CPU</th><th>Recommended number of pods</th></tr></thead><tbody><tr><td>Small</td><td>Up to 1,500</td><td>2 cores</td><td>1</td></tr><tr><td>Medium</td><td>1,500 to 5,000</td><td>2 to 8 cores</td><td>1 to 2</td></tr><tr><td>Large</td><td>5,000 to 10,000</td><td>8 to 14 cores</td><td>2+</td></tr></tbody></table>

### Calculate required replicas

Determine the required number of pod replicas in a specific environment using the following formulas.

**Prerequisites**

* Identify the total number of unique (`host_id`, `node_id`) pairs in the cluster.
* Define the CPU limit per pod.

**Procedure**

1. Calculate the required CPU cores.

$$
Required\_CPU = \frac{Number\_of\_pairs}{Pairs\_per\_1\_CPU\_core}
$$

2. Calculate the required replicas based on the pod CPU limit.

$$
Required\_replicas = \frac{Required\_CPU}{CPU\_limit\_per\_pod}
$$

**Example**

For a cluster with 10,000 unique pairs and a limit of 16 CPU cores per pod:

1. Required CPU cores: 10,000 / 750 = 13.4 cores.
2. Required replicas = 13.4 / 16 =\~ 1 (maximum utilization).

To ensure safe usage at 70%, round up to 2 replicas.

## Configure resource overrides for high stats throughput

Modify the Helm configuration for either Kubernetes (K8s) or K3s to support high stats throughput by defining resource requests, limits, and autoscaling parameters.

* **K8s Helm values:** Manage performance tuning in K8s environments through a `values.yaml` file. Define overrides within the `api` and `workers` sections to govern resources for the entire cluster. Use the Helm CLI to apply these settings and update the deployment state.
* **K3s configuration JSON:** Manage performance tuning in K3s environments, typically running on a WEKA Management Station (WMS) or a dedicated server. Define overrides within the `helmOverrides` block of the `/opt/wekahome/config/config.json` file. The `homecli local upgrade` command ingests this JSON to apply the specified CPU and memory limits to the local containers.

<details>

<summary>Example for K8s: <code>api</code> and <code>workers</code> sections with default values</summary>

```yaml
api:
  stats:
    replicas: 1
    resources:
      requests:
        memory: 200Mi
        cpu: 200m
      limits:
        memory: 1000Mi
        cpu: 1000m
    autoscaling:
      enabled: true
      minReplicas: 1
      maxReplicas: 10

workers:
  stats:
    enabled: true
    replicas: 1
    resources:
      requests:
        memory: 200Mi
        cpu: 1000m
      limits:
        memory: 1000Mi
        cpu: 2000m
    autoscaling:
      enabled: true
      minReplicas: 1
      maxReplicas: 300
  forwarding:
    replicas: 1
    resources:
      requests:
        memory: "200Mi"
        cpu: 100m
      limits:
        memory: "400Mi"
        cpu: 500m
    autoscaling:
      enabled: true
      minReplicas: 1
      maxReplicas: 10
```

</details>

<details>

<summary>Example for K3s: <code>helmOverrides</code> section with default values</summary>

```json
{
  "helmOverrides": {
    "api": {
      "stats": {
        "replicas": 1,
        "resources": {
          "requests": {
            "memory": "200Mi",
            "cpu": "200m"
          },
          "limits": {
            "memory": "400Mi",
            "cpu": "400m"
          }
        },
        "autoscaling": {
          "enabled": true,
          "minReplicas": 1,
          "maxReplicas": 10
        }
      }
    },
    "workers": {
      "stats": {
        "replicas": 1,
        "resources": {
          "requests": {
            "memory": "200Mi",
            "cpu": "1000m"
          },
          "limits": {
            "memory": "1000Mi",
            "cpu": "2000m"
          }
        },
        "autoscaling": {
          "enabled": true,
          "minReplicas": 2,
          "maxReplicas": 30
        }
      },
      "forwarding": {
        "replicas": 1,
        "resources": {
          "requests": {
            "memory": "200Mi",
            "cpu": "100m"
          },
          "limits": {
            "memory": "400Mi",
            "cpu": "500m"
          }
        },
        "autoscaling": {
          "enabled": true,
          "minReplicas": 1,
          "maxReplicas": 10
        }
      }
    }
  }
}
```

</details>

**Before you begin**

* Calculate the required resources based on the sizing formulas provided in the [#calculate-required-replicas](#calculate-required-replicas "mention") section.
* Ensure the Helm CLI is configured with the correct cluster context and namespace permissions.

**Procedure**

1. Open the configuration file depending on your LWH environment:
   * **K8s:** Update the `api` and `workers` sections in the `values.yaml` file.
   * **K3s:** Update the `helmOverrides` section in the `config.json` file.
2. Define the resources and autoscaling blocks for `api.stats`, `workers.stats`, and `workers.forwarding`.
3. Set the `minReplicas` to a baseline value that ensures stability and the `maxReplicas` to a level that accounts for traffic bursts.
4. Apply the configuration:
   * **K8s:** Run the `helm upgrade` command specifying your values file.
   * **K3s:** Run `homecli local upgrade`.

**Related topics**

[https://github.com/weka/docs-weka-io/blob/5.1/monitor-the-weka-cluster/the-wekaio-support-cloud/deploy-local-weka-home-v4.x-on-k8s#upgrade-local-weka-home](https://github.com/weka/docs-weka-io/blob/5.1/monitor-the-weka-cluster/the-wekaio-support-cloud/deploy-local-weka-home-v4.x-on-k8s#upgrade-local-weka-home "mention")

[Deploy Local WEKA Home on K3s](/monitor-the-weka-cluster/the-wekaio-support-cloud/local-weka-home-deployment.md#upgrade-the-local-weka-home)

## Operational maintenance

Monitor the environment to ensure performance remains within expected limits:

* Track stats worker CPU usage and queue depth.
* Monitor the stats stream size and message backlog.
* Increase replicas before increasing memory if CPU saturation occurs.
* Verify Horizontal Pod Autoscaler (HPA) behavior during peak load periods.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.weka.io/monitor-the-weka-cluster/the-wekaio-support-cloud/lwh-stats-sizing-and-performance-optimization.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
