# List of alerts and corrective actions

<table><thead><tr><th>Alert name</th><th>Description</th><th width="183.44921875">Corrective actions</th><th>Severity</th></tr></thead><tbody><tr><td>AdminDefaultPassword</td><td>Default admin password in use</td><td>Change the admin user password to ensure only authorized users can access the cluster.</td><td>INFO</td></tr><tr><td>AgentNotRunning</td><td>The local agent does not run</td><td>Restart the local agent on the specified server using the command ‘service weka-agent start’.</td><td>DEBUG</td></tr><tr><td>ApproachingClientsUnavailability</td><td>Approaching connected clients limit</td><td>Ensure all backend containers are up or expand the cluster with more backend containers or servers.</td><td>DEBUG</td></tr><tr><td>ApproachingSystemLimit</td><td>Approaching a system limit</td><td>Follow the information specified in the {action_item}.</td><td>MAJOR</td></tr><tr><td>AutoRemoveTimeoutTooLow</td><td>Stateless Client auto-remove timeout too low.</td><td>Remount the host with a higher auto-remove timeout value.</td><td>WARNING</td></tr><tr><td>AvailableMemory</td><td>Not enough available memory</td><td>Check your system.</td><td>MAJOR</td></tr><tr><td>BackendNumaBalancingEnabled</td><td>NUMA balancing is enabled on a backend server</td><td>Disable the automatic NUMA balancing by running the command line 'echo 0 > /proc/sys/kernel/numa_balancing' on the backend server.</td><td>WARNING</td></tr><tr><td>BackendVersionsMismatch</td><td>Backends mismatch cluster version</td><td>Upgrade all the backends to match the cluster's version</td><td>WARNING</td></tr><tr><td>BadDisksCapacityRatio</td><td>Large discrepancy between the smallest and largest drive</td><td>There is a large discrepancy between the sizes of the smallest and largest drives in the system. Replace drives with comparable capacities to minimize the differences between the drives' sizes.</td><td>MAJOR</td></tr><tr><td>BlockedJrpcMethod</td><td>JRPC method is blocked</td><td>Unblock the JRPC method by running the command 'blocked_jrpc_methods_remove' or 'blocked_jrpc_methods_clear' manhole.</td><td>DEBUG</td></tr><tr><td>BondInterfaceCompromised</td><td>Network high availability interface compromised</td><td>Ensure a proper operation of the network configuration, cables, and NICs.</td><td>MINOR</td></tr><tr><td>BucketCapacityExhausting</td><td>Buckets are nearing exhaustion of their maximum capacity</td><td>Consider migration to a cluster with a higher number of buckets</td><td>DEBUG</td></tr><tr><td>BucketHasNoQuorum</td><td>Too many compute processes are down</td><td>The number of inactive compute processes exceed the threshold required for the bucket to function properly, resulting in the bucket being unavailable. Ensure the compute processes on the containers {hosts} are up and running and connected. If the issue is not resolved contact the Customer Success Team.</td><td>DEBUG</td></tr><tr><td>BucketUnresponsive</td><td>Compute resource failure</td><td>Check the connectivity and status of the drives of the leader container. and ensure the compute processes are running and connected. If the issue is not resolved, contact the Customer Success Team.</td><td>CRITICAL</td></tr><tr><td>CPUFrequentStarvation</td><td>CPU frequent starvation detected in the last minute</td><td>Check the logs of the relevant containers for potential hardware or core allocation problems.</td><td>DEBUG</td></tr><tr><td>CPUStarvation</td><td>CPU starvation detected in the last minute</td><td>Check the logs of the relevant containers for potential hardware problems. For specific hang address run /weka/weka_addr2line within the reported weka container on the address to convert them into a symbol name.</td><td>DEBUG</td></tr><tr><td>CWTaskAbortionStuck</td><td>CWTask stuck in aborting state</td><td>Start IO to allow the task to complete aborting</td><td>DEBUG</td></tr><tr><td>ChokingDetected</td><td>High congestion level</td><td>In some situations, the system may slow down IOs when reaching some limits (or even block new IOs at higher limits) until the congested resource is relieved. Such situations may be transient, and the issue will be resolved on its own after a short time. However, some cases suggest an issue that needs to be addressed, such as a workload maxing out the cluster's resources. In such cases, the cluster resources must be expanded, as described in Expanding &#x26; Shrinking Cluster Resources. Contact the Customer Success Team.</td><td>DEBUG</td></tr><tr><td>ClientVersionsMismatch</td><td>Clients mismatch cluster version</td><td>Upgrade the clients to the same version as the cluster by running 'weka local upgrade' locally.</td><td>INFO</td></tr><tr><td>ClockSkew</td><td>Clock skew on server</td><td>Ensure the NTP is configured correctly on the containers and that their clocks are synchronized.</td><td>MINOR</td></tr><tr><td>CloudHealth</td><td>WEKA Home disconnected</td><td>Check that the server has Internet connectivity and is connected to the Weka Home. See the Weka Home - The Weka support cloud topic in the documentation.</td><td>MINOR</td></tr><tr><td>CloudStatsError</td><td>Statistics upload failed</td><td>See the event details in the System Events.</td><td>DEBUG</td></tr><tr><td>ClusterInitializationError</td><td>Cluster initialization error</td><td>Search for the underlying problem causing the error and act accordingly to start IO operations. To clear this alert, run 'weka cluster stop-io'.</td><td>INFO</td></tr><tr><td>ClusterIsUpgrading</td><td>Cluster is upgrading</td><td>If the upgrade doesn't finish successfully, contact the Customer Success Team.</td><td>DEBUG</td></tr><tr><td>ConfigOverridesActive</td><td>Config overrides are active</td><td>Contact the Customer Success Team.</td><td>DEBUG</td></tr><tr><td>CoreOverlapping</td><td>Core Overlapping</td><td>Contact the Customer Success Team.</td><td>MAJOR</td></tr><tr><td>DataIntegrity</td><td>Data integrity problem found</td><td>A scan identifies a certain number of data integrity problems. It highlights issues such as data corruption or inconsistencies that need immediate investigation and resolution. Contact the Customer Success Team.</td><td>CRITICAL</td></tr><tr><td>DataProtection</td><td>Partial data protection</td><td>The cluster's data protection status changes, often due to failing containers or drives. It highlights that the system’s redundancy is compromised and requires immediate attention to restore full data protection. If the cluster is still resilient to 1 failure, this is okay, but still requires checking which process container or drive is down and acting accordingly.</td><td>MINOR</td></tr><tr><td>DedicatedWatchdog</td><td>A dedicated server requires the installation of a hardware watchdog driver.</td><td>Ensure a hardware watchdog driver is available at /dev/watchdog. For details, search the Knowledge Base in the WEKA support portal.</td><td>DEBUG</td></tr><tr><td>DrainingStuck</td><td>Stuck draining</td><td>Check the host status and logs for more information.</td><td>MINOR</td></tr><tr><td>DriveCriticalWarnings</td><td>Drive critical warnings</td><td>Deactivate the drive using the command 'weka cluster drive deactivate' and replace it.</td><td>MAJOR</td></tr><tr><td>DriveDown</td><td>Drive down</td><td>Contact the Customer Success Team to check if the drive requires a replacement.</td><td>MINOR</td></tr><tr><td>DriveEndurancePercentageUsed</td><td>Drive exceeds its life expectancy</td><td>Replace the specified drive before it fails.</td><td>MAJOR</td></tr><tr><td>DriveEnduranceSparesRemaining</td><td>Drive internal spares run too low</td><td>Replace the specified drive before it fails.</td><td>MAJOR</td></tr><tr><td>DriveNVKVRunningLow</td><td>Drive nearing exhaustion of internal resource</td><td>Contact the Customer Success Team.</td><td>DEBUG</td></tr><tr><td>DriveNeedsPhaseout</td><td>A drive has too many errors</td><td>Deactivate the drive using the command 'weka cluster drive deactivate', and probably replace it.</td><td>MAJOR</td></tr><tr><td>FaultsEnabled</td><td>Faults are enabled</td><td>Contact the Customer Success Team.</td><td>DEBUG</td></tr><tr><td>FilesystemKMSError</td><td>Filesystem KMS Error</td><td>Review the filesystem's KMS customization and the KMS configuration and connectivity.</td><td>DEBUG</td></tr><tr><td>FilesystemsThinProvisioningLowSpace</td><td>Filesystems thin provisioning low space</td><td>Thinly provisioned filesystems are nearing capacity limits, potentially leading to storage shortages. Consider adding more SSD capacity to the organization containing these filesystems.</td><td>WARNING</td></tr><tr><td>FilesystemsThinProvisioningReserveReached</td><td>Filesystems thin provisioning capacity reserve reached</td><td>The reserved capacity for thin provisioning is exhausted. Create a new filesystem or expand the filesystem's capacity using the reserved capacity.</td><td>DEBUG</td></tr><tr><td>HangingCacheSync</td><td>Cache sync is hanging</td><td>Consider using <code>weka debug fs drop-dirty-cache</code> to drop the cache and enable other clients to access the file (unsynchronized writes will be lost).</td><td>MINOR</td></tr><tr><td>HangingClusterTasks</td><td>Cluster background task progress is hanging</td><td>If a task, which is expected to show progress or complete within a certain timeframe, stops progressing, it can trigger this alert. This could be due to various reasons like resource contention, system errors, or issues with the task itself. Contact the Customer Success Team.</td><td>DEBUG</td></tr><tr><td>HangingIos</td><td>Some IOs stop responding</td><td>I/O operations have stopped responding on specific nodes, which could be due to storage issues, network problems, resource exhaustion, or process deadlocks. Ensure the compute processes are up and running and connected. If a backend object store is configured ensure it is connected and responsive. If the issue is not resolved contact the Customer Success Team.</td><td>DEBUG</td></tr><tr><td>HighDrivesCapacity</td><td>SSD capacity overflow</td><td>The SSD’s used capacity reaches a critical level, exceeding a predefined threshold of internal reserves. Free up space on the SSDs or add more SSDs to the cluster. Refer to the "Expand Specific Resources of a Container" topic in the documentation.</td><td>MAJOR</td></tr><tr><td>HighLevelOfUnreclaimedCapacityInObjectStore</td><td>High level of unreclaimed space in an object store</td><td></td><td>DEBUG</td></tr><tr><td>HighSSDToRAMRatio</td><td>High SSD to RAM Ratio</td><td>Consider increasing RAM cluster wide, or removing unneeded drives to ensure Filesystems(RAID) requirements and lower SSD to RAM ratio</td><td>DEBUG</td></tr><tr><td>HotspotInodes</td><td>Some files have a long waiting queue for IOs</td><td>Contact the Customer Success Team.</td><td>DEBUG</td></tr><tr><td>IBNotEnhanced</td><td>Enhanced IB mode disabled</td><td>Contact the Customer Success Team.</td><td>DEBUG</td></tr><tr><td>ImbalancedCpuUsage</td><td>Imbalanced CPU usage detected in cluster processes</td><td>Check system configuration: Examine the system configuration for abnormalities that may be causing the CPU usage imbalance.</td><td>DEBUG</td></tr><tr><td>JumboConnectivity</td><td>A container cannot send jumbo frames</td><td>Check the container network settings and the switch to which the container is connected, and ensure to enable jumbo frames. This setting improves performance.</td><td>WARNING</td></tr><tr><td>KMSError</td><td>KMS Error</td><td>Review the KMS configuration and connectivity.</td><td>MAJOR</td></tr><tr><td>LeaderPreparedForUpgrade</td><td>Leader prepared for upgrade</td><td>After the upgrade, the leader state automatically returns to normal. If this alert persists, contact the Customer Success Team.</td><td>DEBUG</td></tr><tr><td>LegacyManualOverridesActive</td><td>Legacy manual overrides are active</td><td>Contact the Customer Success Team.</td><td>DEBUG</td></tr><tr><td>LicenseError</td><td>License error</td><td>Ensure the cluster uses the correct license, the license has not expired, and the allocated space does not exceed the license limits.</td><td>WARNING</td></tr><tr><td>LocalTLSCertificateExpired</td><td>Local TLS certificate expired</td><td>Update the local certificate.</td><td>DEBUG</td></tr><tr><td>LocalTLSCertificateExpiringSoon</td><td>Local TLS certificate is expiring soon</td><td>Update the local certificate.</td><td>DEBUG</td></tr><tr><td>LocalTLSConnectivityToNeighbors</td><td>Outgoing TLS connectivity to backends is down</td><td>Fix the TLS issue. One possibility is errors in the local cacert in /etc/wekaio/certs</td><td>DEBUG</td></tr><tr><td>LowDiskSpace</td><td>Low disk space</td><td>See the event details in the System Events.</td><td>MINOR</td></tr><tr><td>ManualOverridesActive</td><td>Manual overrides are active</td><td>Contact the Customer Success Team.</td><td>DEBUG</td></tr><tr><td>ManualOverridesForced</td><td>Manual overrides are forced</td><td>Contact the Customer Success Team.</td><td>DEBUG</td></tr><tr><td>MismatchedDriveFailureDomain</td><td>A drive failure domain does not match the failure domain of its attached container</td><td>One or more SSD drives in the system fail, reducing the total available capacity below the provisioned level. When the remaining capacity becomes too low, it can lead to hanging I/O operations. Check for down drives.</td><td>MAJOR</td></tr><tr><td>MismatchedJoinSecrets</td><td>Backend containers do not have the same join secrets</td><td>This may create problems rejoining or reforming the cluster. Make sure all backend containers have the same join-secrets.</td><td>DEBUG</td></tr><tr><td>NegativeUnprovisionedCapacity</td><td>Negative unprovisioned capacity</td><td>Resize one or more of the filesystems to reclaim capacity. For more information, contact the Customer Success Team.</td><td>DEBUG</td></tr><tr><td>NetworkFailedToStartPorts</td><td>Network ports failed to start</td><td>Run weka debug net ports $NODE to see the current status.</td><td>DEBUG</td></tr><tr><td>NetworkInterfaceLinkDown</td><td>Network interface link status down</td><td>Check the connectivity to the specified network interface. Verify that nothing blocks it.</td><td>MINOR</td></tr><tr><td>NfsLocksDisabled</td><td>NFS Locks disabled</td><td>Configure config fs using weka nfs global-config set --config-fs=.</td><td>INFO</td></tr><tr><td>NfsServiceDownAlert</td><td>NFS Service Down</td><td>If down services persist, contact the Customer Success Team.</td><td>MAJOR</td></tr><tr><td>NoCgroupsConfigured</td><td>No cgroups configured warnings</td><td>Disabled or improperly configured Cgroups can cause system instability and performance degradation. Enable and configure Cgroups (v1/v2) following the Cgroups configuration section at https://docs.weka.io.</td><td>WARNING</td></tr><tr><td>NoClusterLicense</td><td>No license assigned</td><td>Obtain and install a license from get.weka.io.</td><td>WARNING</td></tr><tr><td>NodeBlacklisted</td><td>A process cannot rejoin the cluster</td><td>To enable the process to rejoin the cluster, whitelist it by running the command ‘weka debug blacklist disable’.</td><td>DEBUG</td></tr><tr><td>NodeDisconnected</td><td>Process disconnected</td><td>Check network connectivity to ensure the processes can communicate with the cluster.</td><td>MINOR</td></tr><tr><td>NodeNetworkUnstable</td><td>A process with an unstable network detected</td><td>Ensure proper network connectivity in the cluster. If the problem is not resolved, contact the Customer Success Team.</td><td>WARNING</td></tr><tr><td>NodeRDMANotActive</td><td>RDMA support for process is Inactive</td><td>Ensure that at least one RDMA-capable device exists.</td><td>DEBUG</td></tr><tr><td>NodeTieringConnectivity</td><td>A process cannot Connect to an object store</td><td>A process cannot connect to the ObjectStore, due to either network connectivity, node / process health, or the OBS vendor equipment itself. Check the connectivity with the object store and ensure the process communicates with it.</td><td>MAJOR</td></tr><tr><td>NonTlsApisAllowed</td><td>Non-TLS APIs are allowed</td><td>Update TLS strictness to enforce encrypted TLS APIs over HTTP</td><td>DEBUG</td></tr><tr><td>NotEnoughActiveDrives</td><td>Reduced data protection</td><td>Check the connectivity and server status. Activate drives in more FDs.</td><td>MAJOR</td></tr><tr><td>NotEnoughMemoryForFilesystemOperation</td><td>Insufficient cluster-wide RAM for proper Filesystem's Operation</td><td>Increase RAM cluster-wide to meet Filesystems(RAID) requirements for RAM or remove drives contributing to SSD capacity</td><td>DEBUG</td></tr><tr><td>NotEnoughSSDCapacity</td><td>Some provisioned capacity is unavailable due to failed drives</td><td>Check for down drives.</td><td>MAJOR</td></tr><tr><td>NotificationQueueHighLoad</td><td>S3 Notification Queue Reached High Watermark</td><td>S3 notifications kafka queue on {hostIds} containers is {HighThreshold}% full, Action required.</td><td>MINOR</td></tr><tr><td>NotificationSendFailure</td><td>S3 Notification Send Failure</td><td>Failures occurred when sending S3 notifications to Kafka on the following hosts during the past {windowMinutes} minutes: {hostIds}. Check system logs for details and restore Kafka service availability.</td><td>MAJOR</td></tr><tr><td>PartialConnectivityTrackingDisabled</td><td>Partial connectivity tracking is disabled</td><td>Contact the Customer Success Team to turn on the grim reaper.</td><td>DEBUG</td></tr><tr><td>PartialHugepageAllocation</td><td>Not enough memory</td><td>Check your system.</td><td>MAJOR</td></tr><tr><td>PartiallyConnectedNode</td><td>A partially connected process detected</td><td>Ensure proper network connectivity in the cluster. If the problem is not resolved, contact the Customer Success Team.</td><td>MINOR</td></tr><tr><td>PassedClientsAvailabilityThreshold vReached connected clients limit</td><td>Add more backend containers or servers to the cluster, check whether the backends are down, or disconnect some clients.</td><td>DEBUG</td><td></td></tr><tr><td>PathsDegraded</td><td>Degraded Paths</td><td>Contact the Customer Success Team to review path connectivity.</td><td>MINOR</td></tr><tr><td>PerformanceDegradedLowRAM</td><td>Low Server RAM</td><td>Add more servers to the cluster, add RAM to the backend servers or increase the memory allocation to the compute processes.</td><td>MAJOR</td></tr><tr><td>QuotasHardLimitReached</td><td>Directory quota hard limit exceeded</td><td>Run the 'weka fs quota list' cluster commnad to get the list of directories exceeding their hard quota limits. Clear some space for these directories or increase their hard quota limit.</td><td>WARNING</td></tr><tr><td>QuotasSoftLimitReached</td><td>Directory quota soft limit exceeded</td><td>Run the 'weka fs quota list' cluster command to get the list of directories exceeding their soft quota limits. Clear some space for these directories or increase their hard quota limit.</td><td>INFO</td></tr><tr><td>RAIDCapacityExhaustion</td><td>RAID capacity exhaustion</td><td>If this situation does not resolve it self within a short period of time (~5 minutes) contact the Customer Success Team.</td><td>MAJOR</td></tr><tr><td>RequestedActionFailure</td><td>Requested action failure</td><td>Check the logs for more information.</td><td>DEBUG</td></tr><tr><td>ResourcesNotApplied</td><td>Resource changes are not applied</td><td>Apply the resource changes by running the command 'weka cluster container apply '.</td><td>DEBUG</td></tr><tr><td>SSDCapacityDiscrepancy</td><td>Mismatch between the actual SSD capacity usage and the expected range</td><td>There is a mismatch between the actual SSD capacity usage and the expected range. The discrepancy could be caused by misconfiguration, inefficient tiering, data overgrowth, or other underlying issues. Monitor the compute processes' stability and contact the Customer Success Team.</td><td>DEBUG</td></tr><tr><td>SSDCapacityTooHigh</td><td>Available capacity cannot be fully utilized</td><td>The SSD capacity is being underutilized due to an insufficient number of configured WEKA buckets. As a result, only a percentage of the available SSD space is usable. The message dynamically includes: usable_capacity: The amount of SSD space that can currently be utilized, percentage: The percentage of the total SSD capacity that is available for use, and full_capacity: The total SSD capacity that is theoretically available if fully configured. Contact the Customer Success Team for assistance in optimizing the SSD capacity usage.</td><td>INFO</td></tr><tr><td>SystemDefinedTLS</td><td>TLS certificate is not user-defined</td><td>Replace the auto-generated self-signed certificate with a user-defined certificate by running the command 'weka security tls set'.</td><td>INFO</td></tr><tr><td>TLSCertificateExpired</td><td>TLS certificate expired</td><td>Replace the existing certificate by running the command 'weka security tls set'.</td><td>MAJOR</td></tr><tr><td>TLSCertificateExpiresSoon</td><td>TLS certificate is about to expire</td><td>Replace the existing certificate by running the command 'weka security tls set'.</td><td>MAJOR</td></tr><tr><td>TelemetryStatusFault</td><td>Telemetry status is not streaming</td><td>Check your telemetry sinks configuartion.</td><td>DEBUG</td></tr><tr><td>TieredFilesystemOverfillingSSD</td><td>Tiered filesystems' SSD capacity overfilling</td><td>A tiered filesystem exceeds a predefined threshold of SSD usage. In a tiered system, data should be offloaded (tiered) from the SSDs to object storage when SSD capacity starts to fill up. Resolve tiering connectivity problems or increase the upload bandwidth.</td><td>WARNING</td></tr><tr><td>TooManyPendingClusterwideJobs</td><td>Too many pending cluster wide jobs</td><td>Consider changing the policy configuration.</td><td>DEBUG</td></tr><tr><td>TraceDumperDown</td><td>Trace dumper is down</td><td>Contact the Customer Success Team to restart the trace dumper.'</td><td>DEBUG</td></tr><tr><td>TracesDisabled</td><td>Traces are disabled</td><td>To turn the cluster traces, run the command 'weka debug traces start'. For more information, see the Traces management topic in the documentation.</td><td>DEBUG</td></tr><tr><td>TracesFreezePeriodActive</td><td>Freeze traces is active</td><td>If the problem persists after the case is resolved, contact the Customer Success Team.</td><td>DEBUG</td></tr><tr><td>UdpModePerformanceWarning</td><td>A backend container is configured in UDP mode</td><td>If this is a misconfiguration, add network devices to the specified backend container using the command ‘weka cluster container net add’.</td><td>DEBUG</td></tr><tr><td>UnwritableDisksConfigured</td><td>A drive is set to unwritable</td><td>If the drive remains unwritable after maintenance, contact the Customer Success Team.</td><td>DEBUG</td></tr></tbody></table>
