> For the complete documentation index, see [llms.txt](https://docs.weka.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.weka.io/4.4/operation-guide/alerts/list-of-alerts-and-corrective-actions.md). # List of alerts and corrective actions

Alert name	Description	Corrective actions	Severity
AdminDefaultPassword	Default admin password in use	Change the admin user password to ensure only authorized users can access the cluster.	INFO
AgentNotRunning	The local agent does not run	Restart the local agent on the specified server using the command ‘service weka-agent start’.	DEBUG
ApproachingClientsUnavailability	Approaching connected clients limit	Ensure all backend containers are up or expand the cluster with more backend containers or servers.	DEBUG
ApproachingSystemLimit	Approaching a system limit	Follow the information specified in the {action_item}.	MAJOR
AutoRemoveTimeoutTooLow	Stateless Client auto-remove timeout too low.	Remount the host with a higher auto-remove timeout value.	WARNING
AvailableMemory	Not enough available memory	Check your system.	MAJOR
BackendNumaBalancingEnabled	NUMA balancing is enabled on a backend server	Disable the automatic NUMA balancing by running the command line 'echo 0 > /proc/sys/kernel/numa_balancing' on the backend server.	WARNING
BackendVersionsMismatch	Backends mismatch cluster version	Upgrade all the backends to match the cluster's version	WARNING
BadDisksCapacityRatio	Large discrepancy between the smallest and largest drive	There is a large discrepancy between the sizes of the smallest and largest drives in the system. Replace drives with comparable capacities to minimize the differences between the drives' sizes.	MAJOR
BlockedJrpcMethod	JRPC method is blocked	Unblock the JRPC method by running the command 'blocked_jrpc_methods_remove' or 'blocked_jrpc_methods_clear' manhole.	DEBUG
BondInterfaceCompromised	Network high availability interface compromised	Ensure a proper operation of the network configuration, cables, and NICs.	MINOR
BucketCapacityExhausting	Buckets are nearing exhaustion of their maximum capacity	Consider migration to a cluster with a higher number of buckets	DEBUG
BucketHasNoQuorum	Too many compute processes are down	The number of inactive compute processes exceed the threshold required for the bucket to function properly, resulting in the bucket being unavailable. Ensure the compute processes on the containers {hosts} are up and running and connected. If the issue is not resolved contact the Customer Success Team.	DEBUG
BucketUnresponsive	Compute resource failure	Check the connectivity and status of the drives of the leader container. and ensure the compute processes are running and connected. If the issue is not resolved, contact the Customer Success Team.	CRITICAL
CPUFrequentStarvation	CPU frequent starvation detected in the last minute	Check the logs of the relevant containers for potential hardware or core allocation problems.	DEBUG
CPUStarvation	CPU starvation detected in the last minute	Check the logs of the relevant containers for potential hardware problems. For specific hang address run /weka/weka_addr2line within the reported weka container on the address to convert them into a symbol name.	DEBUG
CWTaskAbortionStuck	CWTask stuck in aborting state	Start IO to allow the task to complete aborting	DEBUG
ChokingDetected	High congestion level	In some situations, the system may slow down IOs when reaching some limits (or even block new IOs at higher limits) until the congested resource is relieved. Such situations may be transient, and the issue will be resolved on its own after a short time. However, some cases suggest an issue that needs to be addressed, such as a workload maxing out the cluster's resources. In such cases, the cluster resources must be expanded, as described in Expanding & Shrinking Cluster Resources. Contact the Customer Success Team.	DEBUG
ClientVersionsMismatch	Clients mismatch cluster version	Upgrade the clients to the same version as the cluster by running 'weka local upgrade' locally.	INFO
ClockSkew	Clock skew on server	Ensure the NTP is configured correctly on the containers and that their clocks are synchronized.	MINOR
CloudHealth	WEKA Home disconnected	Check that the server has Internet connectivity and is connected to the Weka Home. See the Weka Home - The Weka support cloud topic in the documentation.	MINOR
CloudStatsError	Statistics upload failed	See the event details in the System Events.	DEBUG
ClusterInitializationError	Cluster initialization error	Search for the underlying problem causing the error and act accordingly to start IO operations. To clear this alert, run 'weka cluster stop-io'.	INFO
ClusterIsUpgrading	Cluster is upgrading	If the upgrade doesn't finish successfully, contact the Customer Success Team.	DEBUG
ConfigOverridesActive	Config overrides are active	Contact the Customer Success Team.	DEBUG
CoreOverlapping	Core Overlapping	Contact the Customer Success Team.	MAJOR
DataIntegrity	Data integrity problem found	A scan identifies a certain number of data integrity problems. It highlights issues such as data corruption or inconsistencies that need immediate investigation and resolution. Contact the Customer Success Team.	CRITICAL
DataProtection	Partial data protection	The cluster's data protection status changes, often due to failing containers or drives. It highlights that the system’s redundancy is compromised and requires immediate attention to restore full data protection. If the cluster is still resilient to 1 failure, this is okay, but still requires checking which process container or drive is down and acting accordingly.	MINOR
DedicatedWatchdog	A dedicated server requires the installation of a hardware watchdog driver.	Ensure a hardware watchdog driver is available at /dev/watchdog. For details, search the Knowledge Base in the WEKA support portal.	DEBUG
DrainingStuck	Stuck draining	Check the host status and logs for more information.	MINOR
DriveCriticalWarnings	Drive critical warnings	Deactivate the drive using the command 'weka cluster drive deactivate' and replace it.	MAJOR
DriveDown	Drive down	Contact the Customer Success Team to check if the drive requires a replacement.	MINOR
DriveEndurancePercentageUsed	Drive exceeds its life expectancy	Replace the specified drive before it fails.	MAJOR
DriveEnduranceSparesRemaining	Drive internal spares run too low	Replace the specified drive before it fails.	MAJOR
DriveNVKVRunningLow	Drive nearing exhaustion of internal resource	Contact the Customer Success Team.	DEBUG
DriveNeedsPhaseout	A drive has too many errors	Deactivate the drive using the command 'weka cluster drive deactivate', and probably replace it.	MAJOR
FaultsEnabled	Faults are enabled	Contact the Customer Success Team.	DEBUG
FilesystemKMSError	Filesystem KMS Error	Review the filesystem's KMS customization and the KMS configuration and connectivity.	DEBUG
FilesystemsThinProvisioningLowSpace	Filesystems thin provisioning low space	Thinly provisioned filesystems are nearing capacity limits, potentially leading to storage shortages. Consider adding more SSD capacity to the organization containing these filesystems.	WARNING
FilesystemsThinProvisioningReserveReached	Filesystems thin provisioning capacity reserve reached	The reserved capacity for thin provisioning is exhausted. Create a new filesystem or expand the filesystem's capacity using the reserved capacity.	DEBUG
HangingCacheSync	Cache sync is hanging	Consider using `weka debug fs drop-dirty-cache` to drop the cache and enable other clients to access the file (unsynchronized writes will be lost).	MINOR
HangingClusterTasks	Cluster background task progress is hanging	If a task, which is expected to show progress or complete within a certain timeframe, stops progressing, it can trigger this alert. This could be due to various reasons like resource contention, system errors, or issues with the task itself. Contact the Customer Success Team.	DEBUG
HangingIos	Some IOs stop responding	I/O operations have stopped responding on specific nodes, which could be due to storage issues, network problems, resource exhaustion, or process deadlocks. Ensure the compute processes are up and running and connected. If a backend object store is configured ensure it is connected and responsive. If the issue is not resolved contact the Customer Success Team.	DEBUG
HighDrivesCapacity	SSD capacity overflow	The SSD’s used capacity reaches a critical level, exceeding a predefined threshold of internal reserves. Free up space on the SSDs or add more SSDs to the cluster. Refer to the "Expand Specific Resources of a Container" topic in the documentation.	MAJOR
HighLevelOfUnreclaimedCapacityInObjectStore	High level of unreclaimed space in an object store		DEBUG
HighSSDToRAMRatio	High SSD to RAM Ratio	Consider increasing RAM cluster wide, or removing unneeded drives to ensure Filesystems(RAID) requirements and lower SSD to RAM ratio	DEBUG
HotspotInodes	Some files have a long waiting queue for IOs	Contact the Customer Success Team.	DEBUG
IBNotEnhanced	Enhanced IB mode disabled	Contact the Customer Success Team.	DEBUG
ImbalancedCpuUsage	Imbalanced CPU usage detected in cluster processes	Check system configuration: Examine the system configuration for abnormalities that may be causing the CPU usage imbalance.	DEBUG
JumboConnectivity	A container cannot send jumbo frames	Check the container network settings and the switch to which the container is connected, and ensure to enable jumbo frames. This setting improves performance.	WARNING
KMSError	KMS Error	Review the KMS configuration and connectivity.	MAJOR
LeaderPreparedForUpgrade	Leader prepared for upgrade	After the upgrade, the leader state automatically returns to normal. If this alert persists, contact the Customer Success Team.	DEBUG
LegacyManualOverridesActive	Legacy manual overrides are active	Contact the Customer Success Team.	DEBUG
LicenseError	License error	Ensure the cluster uses the correct license, the license has not expired, and the allocated space does not exceed the license limits.	WARNING
LocalTLSCertificateExpired	Local TLS certificate expired	Update the local certificate.	DEBUG
LocalTLSCertificateExpiringSoon	Local TLS certificate is expiring soon	Update the local certificate.	DEBUG
LocalTLSConnectivityToNeighbors	Outgoing TLS connectivity to backends is down	Fix the TLS issue. One possibility is errors in the local cacert in /etc/wekaio/certs	DEBUG
LowDiskSpace	Low disk space	See the event details in the System Events.	MINOR
ManualOverridesActive	Manual overrides are active	Contact the Customer Success Team.	DEBUG
ManualOverridesForced	Manual overrides are forced	Contact the Customer Success Team.	DEBUG
MismatchedDriveFailureDomain	A drive failure domain does not match the failure domain of its attached container	One or more SSD drives in the system fail, reducing the total available capacity below the provisioned level. When the remaining capacity becomes too low, it can lead to hanging I/O operations. Check for down drives.	MAJOR
MismatchedJoinSecrets	Backend containers do not have the same join secrets	This may create problems rejoining or reforming the cluster. Make sure all backend containers have the same join-secrets.	DEBUG
NegativeUnprovisionedCapacity	Negative unprovisioned capacity	Resize one or more of the filesystems to reclaim capacity. For more information, contact the Customer Success Team.	DEBUG
NetworkFailedToStartPorts	Network ports failed to start	Run weka debug net ports $NODE to see the current status.	DEBUG
NetworkInterfaceLinkDown	Network interface link status down	Check the connectivity to the specified network interface. Verify that nothing blocks it.	MINOR
NfsLocksDisabled	NFS Locks disabled	Configure config fs using weka nfs global-config set --config-fs=.	INFO
NfsServiceDownAlert	NFS Service Down	If down services persist, contact the Customer Success Team.	MAJOR
NoCgroupsConfigured	No cgroups configured warnings	Disabled or improperly configured Cgroups can cause system instability and performance degradation. Enable and configure Cgroups (v1/v2) following the Cgroups configuration section at https://docs.weka.io.	WARNING
NoClusterLicense	No license assigned	Obtain and install a license from get.weka.io.	WARNING
NodeBlacklisted	A process cannot rejoin the cluster	To enable the process to rejoin the cluster, whitelist it by running the command ‘weka debug blacklist disable’.	DEBUG
NodeDisconnected	Process disconnected	Check network connectivity to ensure the processes can communicate with the cluster.	MINOR
NodeNetworkUnstable	A process with an unstable network detected	Ensure proper network connectivity in the cluster. If the problem is not resolved, contact the Customer Success Team.	WARNING
NodeRDMANotActive	RDMA support for process is Inactive	Ensure that at least one RDMA-capable device exists.	DEBUG
NodeTieringConnectivity	A process cannot Connect to an object store	A process cannot connect to the ObjectStore, due to either network connectivity, node / process health, or the OBS vendor equipment itself. Check the connectivity with the object store and ensure the process communicates with it.	MAJOR
NonTlsApisAllowed	Non-TLS APIs are allowed	Update TLS strictness to enforce encrypted TLS APIs over HTTP	DEBUG
NotEnoughActiveDrives	Reduced data protection	Check the connectivity and server status. Activate drives in more FDs.	MAJOR
NotEnoughMemoryForFilesystemOperation	Insufficient cluster-wide RAM for proper Filesystem's Operation	Increase RAM cluster-wide to meet Filesystems(RAID) requirements for RAM or remove drives contributing to SSD capacity	DEBUG
NotEnoughSSDCapacity	Some provisioned capacity is unavailable due to failed drives	Check for down drives.	MAJOR
NotificationQueueHighLoad	S3 Notification Queue Reached High Watermark	S3 notifications kafka queue on {hostIds} containers is {HighThreshold}% full, Action required.	MINOR
NotificationSendFailure	S3 Notification Send Failure	Failures occurred when sending S3 notifications to Kafka on the following hosts during the past {windowMinutes} minutes: {hostIds}. Check system logs for details and restore Kafka service availability.	MAJOR
PartialConnectivityTrackingDisabled	Partial connectivity tracking is disabled	Contact the Customer Success Team to turn on the grim reaper.	DEBUG
PartialHugepageAllocation	Not enough memory	Check your system.	MAJOR
PartiallyConnectedNode	A partially connected process detected	Ensure proper network connectivity in the cluster. If the problem is not resolved, contact the Customer Success Team.	MINOR
PassedClientsAvailabilityThreshold vReached connected clients limit	Add more backend containers or servers to the cluster, check whether the backends are down, or disconnect some clients.	DEBUG
PathsDegraded	Degraded Paths	Contact the Customer Success Team to review path connectivity.	MINOR
PerformanceDegradedLowRAM	Low Server RAM	Add more servers to the cluster, add RAM to the backend servers or increase the memory allocation to the compute processes.	MAJOR
QuotasHardLimitReached	Directory quota hard limit exceeded	Run the 'weka fs quota list' cluster commnad to get the list of directories exceeding their hard quota limits. Clear some space for these directories or increase their hard quota limit.	WARNING
QuotasSoftLimitReached	Directory quota soft limit exceeded	Run the 'weka fs quota list' cluster command to get the list of directories exceeding their soft quota limits. Clear some space for these directories or increase their hard quota limit.	INFO
RAIDCapacityExhaustion	RAID capacity exhaustion	If this situation does not resolve it self within a short period of time (~5 minutes) contact the Customer Success Team.	MAJOR
RequestedActionFailure	Requested action failure	Check the logs for more information.	DEBUG
ResourcesNotApplied	Resource changes are not applied	Apply the resource changes by running the command 'weka cluster container apply '.	DEBUG
SSDCapacityDiscrepancy	Mismatch between the actual SSD capacity usage and the expected range	There is a mismatch between the actual SSD capacity usage and the expected range. The discrepancy could be caused by misconfiguration, inefficient tiering, data overgrowth, or other underlying issues. Monitor the compute processes' stability and contact the Customer Success Team.	DEBUG
SSDCapacityTooHigh	Available capacity cannot be fully utilized	The SSD capacity is being underutilized due to an insufficient number of configured WEKA buckets. As a result, only a percentage of the available SSD space is usable. The message dynamically includes: usable_capacity: The amount of SSD space that can currently be utilized, percentage: The percentage of the total SSD capacity that is available for use, and full_capacity: The total SSD capacity that is theoretically available if fully configured. Contact the Customer Success Team for assistance in optimizing the SSD capacity usage.	INFO
SystemDefinedTLS	TLS certificate is not user-defined	Replace the auto-generated self-signed certificate with a user-defined certificate by running the command 'weka security tls set'.	INFO
TLSCertificateExpired	TLS certificate expired	Replace the existing certificate by running the command 'weka security tls set'.	MAJOR
TLSCertificateExpiresSoon	TLS certificate is about to expire	Replace the existing certificate by running the command 'weka security tls set'.	MAJOR
TelemetryStatusFault	Telemetry status is not streaming	Check your telemetry sinks configuartion.	DEBUG
TieredFilesystemOverfillingSSD	Tiered filesystems' SSD capacity overfilling	A tiered filesystem exceeds a predefined threshold of SSD usage. In a tiered system, data should be offloaded (tiered) from the SSDs to object storage when SSD capacity starts to fill up. Resolve tiering connectivity problems or increase the upload bandwidth.	WARNING
TooManyPendingClusterwideJobs	Too many pending cluster wide jobs	Consider changing the policy configuration.	DEBUG
TraceDumperDown	Trace dumper is down	Contact the Customer Success Team to restart the trace dumper.'	DEBUG
TracesDisabled	Traces are disabled	To turn the cluster traces, run the command 'weka debug traces start'. For more information, see the Traces management topic in the documentation.	DEBUG
TracesFreezePeriodActive	Freeze traces is active	If the problem persists after the case is resolved, contact the Customer Success Team.	DEBUG
UdpModePerformanceWarning	A backend container is configured in UDP mode	If this is a misconfiguration, add network devices to the specified backend container using the command ‘weka cluster container net add’.	DEBUG
UnwritableDisksConfigured	A drive is set to unwritable	If the drive remains unwritable after maintenance, contact the Customer Success Team.	DEBUG

--- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://docs.weka.io/4.4/operation-guide/alerts/list-of-alerts-and-corrective-actions.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.