List of alerts and corrective actions

Check WEKA system alerts and take necessary actions based on severity and nature.

Alert name	Description	Corrective actions
AdminDefaultPassword	Default admin password in use	Change the admin user password to ensure only authorized users can access the cluster.
AgentNotRunning	The local agent does not run	Restart the local agent on the specified server using the command ‘service weka-agent start’.
ApproachingClientsUnavailability	Approaching connected clients limit	Ensure all backend containers are up or expand the cluster with more backend containers or servers.
AutoRemoveTimeoutTooLow	Stateless Client auto-remove timeout too low	Remount the host with a higher auto-remove timeout value.
BackendNumaBalancingEnabled	NUMA balancing is enabled on a backend server	Disable the automatic NUMA balancing by running the command line 'echo 0 > /proc/sys/kernel/numa_balancing' on the backend server.
BackendVersionsMismatch	Backends mismatch cluster version	Upgrade all the backends to match the cluster's version.
BlockedJrpcMethod	JRPC method is blocked	Unblock the JRPC method by running the command 'blocked_jrpc_methods_remove' or 'blocked_jrpc_methods_clear' manhole.
BondInterfaceCompromised	Network high availability interface compromised	Ensure a proper operation of the network configuration, cables, and NICs.
BucketCapacityExhausting	Buckets are nearing exhaustion of their maximum capacity	Consider migration to a cluster with more buckets.
BucketHasNoQuorum	Too many compute processes are down	Ensure the compute processes on the containers {hosts} are up and running and connected. If the issue is not resolved, contact the Customer Success Team.
BucketUnresponsive	Compute resource failure	Check the connectivity and status of the drives of the container {leader_name} ({leader_nid}, {leader_hid}) and ensure the compute processes are running and connected. If the issue is not resolved, contact the Customer Success Team.
CPUFrequentStarvation	CPU frequent starvation is detected at the last minute	Check the logs of the relevant containers for potential hardware or core allocation problems.
CPUStarvation	CPU starvation was detected at the last minute	Check the logs of the relevant containers for potential hardware problems.
ChokingDetected	High congestion level	For more information, see the System congestion topic in the documentation.
ClientNumaBalancingEnabled	NUMA balancing is enabled on a client	Disable the automatic NUMA balancing by running the command line 'echo 0 > /proc/sys/kernel/numa_balancing' on the client.
ClientVersionsMismatch	Clients mismatch cluster version	Upgrade the clients to the same version as the cluster by running 'weka local upgrade' locally.
ClockSkew	Clock skew on server	Ensure the NTP is configured correctly on the containers and that their clocks are synchronized.
CloudHealth	Weka Home disconnected	Check that the server has Internet connectivity and is connected to the Weka Home. See the Weka Home - The Weka support cloud topic in the documentation.
CloudStatsError	Statistics upload failed	See the event details in the System Events.
ClusterInitializationError	Cluster initialization error	Search for the underlying problem causing the error and act accordingly to start IO operations. To clear this alert, run 'weka cluster stop-io'.
ClusterIsUpgrading	Cluster is upgrading	If the upgrade doesn't finish successfully, contact the Customer Success Team.
CoreOverlapping	Core Overlapping	Contact the Customer Success Team.
DataIntegrity	Data integrity problem found	Contact the Customer Success Team.
DataProtection	Partial data protection	Check which process, container, or drive is down and act accordingly.
DedicatedWatchdog	A dedicated server requires the installation of a hardware watchdog driver.	Ensure a hardware watchdog driver is available at /dev/watchdog. For details, search the Knowledge Base in the Weka support portal.
DriveCriticalWarnings	Drive critical warnings	Deactivate the drive using the command 'weka cluster drive deactivate' and replace it.
DriveDown	Drive down	Contact the Customer Success Team to check if the drive requires a replacement.
DriveEndurancePercentageUsed	Drive exceeds its life expectancy	Replace the specified drive before it fails.
DriveEnduranceSparesRemaining	Drive internal spares run too low	Replace the specified drive before it fails.
DriveNVKVRunningLow	Drive nearing exhaustion of internal resources	Contact the Customer Success Team.
DriveNeedsPhaseout	A drive has too many errors	Deactivate the drive using the command 'weka cluster drive deactivate', and probably replace it.
ExampleAlert	Example Alert	Disable this alert by running the set_example_alert_off manhole.
FaultsEnabled	Faults are enabled	Contact the Customer Success Team.
FilesystemHasTooManyFiles	Too many files in a filesystem	Increase the filesystem 'max-files' value. If required, decrease the 'max-files' value of another filesystem or expand the memory.
FilesystemsThinProvisioningLowSpace	Filesystems thin provisioning low space	Consider adding SSD capacity to this organization containing these filesystems.
FilesystemsThinProvisioningReserveReached	Filesystems thin provisioning capacity reserve reached	You can create a filesystem or expand the filesystem capacity using the reserved capacity.
HangingCacheSync	Cache sync is stopped	Reboot the server or remove it from the cluster.
HangingIos	Some IOs stop responding	Ensure the compute processes are up and running and connected. If a backend object store is configured, ensure it is connected and responsive. If the issue is not resolved, contact the Customer Success Team.
HighDrivesCapacity	SSD capacity overflow	Free up space on the SSDs or add more SSDs to the cluster. To add SSDs, see the Exapnd specific resources of a container topic in the documentation.
HighLevelOfUnreclaimedCapacityInObjectStore	High level of unreclaimed space in an object store.
JumboConnectivity	A container cannot send jumbo frames	Check the container network settings and the switch to which the container is connected, and ensure to enable jumbo frames. This setting improves performance.
KMSError	KMS Error	Review the KMS configuration and connectivity.
LeaderPreparedForUpgrade	Leader prepared for upgrade	After the upgrade, the leader state automatically returns to normal. If this alert persists, contact the Customer Success Team.
LegacyManualOverridesActive	Legacy manual overrides are active	Contact the Customer Success Team.
LicenseError	License error	Ensure the cluster uses the correct license, the license has not expired, and the allocated space does not exceed the license limits.
LowDiskSpace	Low disk space	See the event details in the System Events.
ManualOverridesActive	Manual overrides are active	Contact the Customer Success Team.
ManualOverridesForced	Manual overrides are forced	Contact the Customer Success Team.
MismatchedDriveFailureDomain	A drive failure domain does not match the failure domain of its attached container	Do one of the following: a) Connect the mismatched drive to a container with a matching failure domain. b) Re-provision the drive to erase its failure domain.
NegativeUnprovisionedCapacity	Negative unprovisioned capacity	Resize one or more of the filesystems to reclaim capacity. For more information, contact the Customer Success Team.
NetworkInterfaceLinkDown	Network interface link status down	Check the connectivity to the specified network interface. Verify that nothing blocks it.
NoClusterLicense	No license assigned	Obtain and install a license from get.weka.io.
NodeBlacklisted	A process cannot rejoin the cluster	To enable the process to rejoin the cluster, whitelist it by running the command ‘weka debug blacklist disable’.
NodeDisconnected	Process disconnected	Check network connectivity to ensure the processes can communicate with the cluster.
NodeNetworkUnstable	A process with an unstable network detected	Ensure proper network connectivity in the cluster. If the problem is not resolved, contact the Customer Success Team.
vNodeRDMANotActive	PA process with supported RDMA is Inactive	Ensure Mellanox OFED version 4.6 or later is installed on the server and at least one RDMA-capable device exists.
NodeTieringConnectivity	A process cannot Connect to an object store	Check the connectivity with the object store and ensure the process communicates with it.
NotEnoughActiveDrives	Reduced data protection	Check the connectivity and server status. Replace failed drives and expand the cluster with new failure domains.
PartialConnectivityTrackingDisabled	Partial connectivity tracking is disabled	To turn on the Grim Reaper, contact the Customer Success Team.
PartiallyConnectedNode	A partially connected process detected	Ensure proper network connectivity in the cluster. If the problem is not resolved, contact the Customer Success Team.
PassedClientsAvailabilityThreshold	Reached connected clients limit	Add more backend containers or servers to the cluster, check whether the backends are down, or disconnect some clients.
PerformanceDegradedLowRAM	Server low RAM	Ensure all the compute processes are up. Add more servers to the cluster or add RAM to the backend servers.
QuotasHardLimitReached	Directory quota hard limit exceeded	Run 'weka fs quota list' to get the list of directories exceeding their hard quota limits. Clear some space for these directories or increase their hard quota limit.
QuotasSoftLimitReached	Directory quota soft limit exceeded	Run 'weka fs quota list' to get the list of directories exceeding their soft quota limits. Clear some space for these directories or increase their soft quota limit.
RAIDCapacityExhaustion	RAID capacity exhaustion	If the situation is not resolved within minutes, contact the Customer Success Team.
ResourcesNotApplied	Resource changes are not applied	Apply the resource changes by running the command 'weka cluster container apply '.
S3EtcdMigrationAlert	S3 etcd migration	Contact the Customer Success Team to migrate this cluster configuration storage from ETCD to the new built-in Weka solution
SSDCapacityDiscrepancy	Used SSD capacity mismatches the expected range	Monitor the compute processes' stability and contact the Customer Success Team.
SSDCapacityTooHigh	Available capacity cannot be fully utilized	For improved SSD capacity usage, contact the Customer Success Team for assistance.
SystemDefinedTLS	TLS certificate is not user-defined	Replace the auto-generated self-signed certificate with a user-defined certificate by running the command 'weka security tls set'.
TLSCertificateExpired	TLS certificate expired	Replace the existing certificate by running the command 'weka security tls set'.
TLSCertificateExpiresSoon	TLS certificate is about to expire	Replace the existing certificate by running the command 'weka security tls set'.
TieredFilesystemOverfillingSSD	Tiered filesystems' SSD capacity overfilling	Consider expanding the filesystem size or removing data and directories. Identify and resolve connectivity problems with the configured Object Store. Increase the upload bandwidth if required.
TraceDumperDown	Trace dumper is down	Contact the Customer Success Team to restart the trace dumper.
TracesDisabled	Traces are disabled	To turn the cluster traces, run the command 'weka debug traces start'. For more information, see the Traces management topic in the documentation.
TracesFreezePeriodActive	Freeze traces feature is active	If the problem persists after the case is resolved, contact the Customer Success Team.
UdpModePerformanceWarning	A backend container is configured in UDP mode	If this is a misconfiguration, add network devices to the specified backend container using the command ‘weka cluster container net add’.
UnwritableDisksConfigured	A drive is set to unwritable	If the drive remains unwritable after maintenance, contact the Customer Success Team.

Alert name

Description

Corrective actions

AdminDefaultPassword

Default admin password in use

Change the admin user password to ensure only authorized users can access the cluster.

AgentNotRunning

The local agent does not run

Restart the local agent on the specified server using the command ‘service weka-agent start’.

ApproachingClientsUnavailability

Approaching connected clients limit

Ensure all backend containers are up or expand the cluster with more backend containers or servers.

AutoRemoveTimeoutTooLow

Stateless Client auto-remove timeout too low

Remount the host with a higher auto-remove timeout value.

BackendNumaBalancingEnabled

NUMA balancing is enabled on a backend server

Disable the automatic NUMA balancing by running the command line 'echo 0 > /proc/sys/kernel/numa_balancing' on the backend server.

BackendVersionsMismatch

Backends mismatch cluster version

Upgrade all the backends to match the cluster's version.

BlockedJrpcMethod

JRPC method is blocked

Unblock the JRPC method by running the command 'blocked_jrpc_methods_remove' or 'blocked_jrpc_methods_clear' manhole.

BondInterfaceCompromised

Network high availability interface compromised

Ensure a proper operation of the network configuration, cables, and NICs.

BucketCapacityExhausting

Buckets are nearing exhaustion of their maximum capacity

Consider migration to a cluster with more buckets.

BucketHasNoQuorum

Too many compute processes are down

Ensure the compute processes on the containers {hosts} are up and running and connected. If the issue is not resolved, contact the Customer Success Team.

BucketUnresponsive

Compute resource failure

Check the connectivity and status of the drives of the container {leader_name} ({leader_nid}, {leader_hid}) and ensure the compute processes are running and connected. If the issue is not resolved, contact the Customer Success Team.

CPUFrequentStarvation

CPU frequent starvation is detected at the last minute

Check the logs of the relevant containers for potential hardware or core allocation problems.

CPUStarvation

CPU starvation was detected at the last minute

Check the logs of the relevant containers for potential hardware problems.

ChokingDetected

High congestion level

For more information, see the System congestion topic in the documentation.

ClientNumaBalancingEnabled

NUMA balancing is enabled on a client

Disable the automatic NUMA balancing by running the command line 'echo 0 > /proc/sys/kernel/numa_balancing' on the client.

ClientVersionsMismatch

Clients mismatch cluster version

Upgrade the clients to the same version as the cluster by running 'weka local upgrade' locally.

ClockSkew

Clock skew on server

Ensure the NTP is configured correctly on the containers and that their clocks are synchronized.

CloudHealth

Weka Home disconnected

Check that the server has Internet connectivity and is connected to the Weka Home. See the Weka Home - The Weka support cloud topic in the documentation.

CloudStatsError

Statistics upload failed

See the event details in the System Events.

ClusterInitializationError

Cluster initialization error

Search for the underlying problem causing the error and act accordingly to start IO operations. To clear this alert, run 'weka cluster stop-io'.

ClusterIsUpgrading

Cluster is upgrading

If the upgrade doesn't finish successfully, contact the Customer Success Team.

CoreOverlapping

Core Overlapping

Contact the Customer Success Team.

DataIntegrity

Data integrity problem found

Contact the Customer Success Team.

DataProtection

Partial data protection

Check which process, container, or drive is down and act accordingly.

DedicatedWatchdog

A dedicated server requires the installation of a hardware watchdog driver.

Ensure a hardware watchdog driver is available at /dev/watchdog. For details, search the Knowledge Base in the Weka support portal.

DriveCriticalWarnings

Drive critical warnings

Deactivate the drive using the command 'weka cluster drive deactivate' and replace it.

DriveDown

Drive down

Contact the Customer Success Team to check if the drive requires a replacement.

DriveEndurancePercentageUsed

Drive exceeds its life expectancy

Replace the specified drive before it fails.

DriveEnduranceSparesRemaining

Drive internal spares run too low

Replace the specified drive before it fails.

DriveNVKVRunningLow

Drive nearing exhaustion of internal resources

Contact the Customer Success Team.

DriveNeedsPhaseout

A drive has too many errors

Deactivate the drive using the command 'weka cluster drive deactivate', and probably replace it.

ExampleAlert

Example Alert

Disable this alert by running the set_example_alert_off manhole.

FaultsEnabled

Faults are enabled

Contact the Customer Success Team.

FilesystemHasTooManyFiles

Too many files in a filesystem

Increase the filesystem 'max-files' value. If required, decrease the 'max-files' value of another filesystem or expand the memory.

FilesystemsThinProvisioningLowSpace

Filesystems thin provisioning low space

Consider adding SSD capacity to this organization containing these filesystems.

FilesystemsThinProvisioningReserveReached

Filesystems thin provisioning capacity reserve reached

You can create a filesystem or expand the filesystem capacity using the reserved capacity.

HangingCacheSync

Cache sync is stopped

Reboot the server or remove it from the cluster.

HangingIos

Some IOs stop responding

Ensure the compute processes are up and running and connected. If a backend object store is configured, ensure it is connected and responsive. If the issue is not resolved, contact the Customer Success Team.

HighDrivesCapacity

SSD capacity overflow

Free up space on the SSDs or add more SSDs to the cluster. To add SSDs, see the Exapnd specific resources of a container topic in the documentation.

HighLevelOfUnreclaimedCapacityInObjectStore

High level of unreclaimed space in an object store.

JumboConnectivity

A container cannot send jumbo frames

Check the container network settings and the switch to which the container is connected, and ensure to enable jumbo frames. This setting improves performance.

KMSError

KMS Error

Review the KMS configuration and connectivity.

LeaderPreparedForUpgrade

Leader prepared for upgrade

After the upgrade, the leader state automatically returns to normal. If this alert persists, contact the Customer Success Team.

LegacyManualOverridesActive

Legacy manual overrides are active

Contact the Customer Success Team.

LicenseError

License error

Ensure the cluster uses the correct license, the license has not expired, and the allocated space does not exceed the license limits.

LowDiskSpace

Low disk space

See the event details in the System Events.

ManualOverridesActive

Manual overrides are active

Contact the Customer Success Team.

ManualOverridesForced

Manual overrides are forced

Contact the Customer Success Team.

MismatchedDriveFailureDomain

A drive failure domain does not match the failure domain of its attached container

Do one of the following: a) Connect the mismatched drive to a container with a matching failure domain. b) Re-provision the drive to erase its failure domain.

NegativeUnprovisionedCapacity

Negative unprovisioned capacity

Resize one or more of the filesystems to reclaim capacity. For more information, contact the Customer Success Team.

NetworkInterfaceLinkDown

Network interface link status down

Check the connectivity to the specified network interface. Verify that nothing blocks it.

NoClusterLicense

No license assigned

Obtain and install a license from get.weka.io.

NodeBlacklisted

A process cannot rejoin the cluster

To enable the process to rejoin the cluster, whitelist it by running the command ‘weka debug blacklist disable’.

NodeDisconnected

Process disconnected

Check network connectivity to ensure the processes can communicate with the cluster.

NodeNetworkUnstable

A process with an unstable network detected

Ensure proper network connectivity in the cluster. If the problem is not resolved, contact the Customer Success Team.

vNodeRDMANotActive

PA process with supported RDMA is Inactive

Ensure Mellanox OFED version 4.6 or later is installed on the server and at least one RDMA-capable device exists.

NodeTieringConnectivity

A process cannot Connect to an object store

Check the connectivity with the object store and ensure the process communicates with it.

NotEnoughActiveDrives

Reduced data protection

Check the connectivity and server status. Replace failed drives and expand the cluster with new failure domains.

PartialConnectivityTrackingDisabled

Partial connectivity tracking is disabled

To turn on the Grim Reaper, contact the Customer Success Team.

PartiallyConnectedNode

A partially connected process detected

Ensure proper network connectivity in the cluster. If the problem is not resolved, contact the Customer Success Team.

PassedClientsAvailabilityThreshold

Reached connected clients limit

Add more backend containers or servers to the cluster, check whether the backends are down, or disconnect some clients.

PerformanceDegradedLowRAM

Server low RAM

Ensure all the compute processes are up. Add more servers to the cluster or add RAM to the backend servers.

QuotasHardLimitReached

Directory quota hard limit exceeded

Run 'weka fs quota list' to get the list of directories exceeding their hard quota limits. Clear some space for these directories or increase their hard quota limit.

QuotasSoftLimitReached

Directory quota soft limit exceeded

Run 'weka fs quota list' to get the list of directories exceeding their soft quota limits. Clear some space for these directories or increase their soft quota limit.

RAIDCapacityExhaustion

RAID capacity exhaustion

If the situation is not resolved within minutes, contact the Customer Success Team.

ResourcesNotApplied

Resource changes are not applied

Apply the resource changes by running the command 'weka cluster container apply '.

S3EtcdMigrationAlert

S3 etcd migration

Contact the Customer Success Team to migrate this cluster configuration storage from ETCD to the new built-in Weka solution

SSDCapacityDiscrepancy

Used SSD capacity mismatches the expected range

Monitor the compute processes' stability and contact the Customer Success Team.

SSDCapacityTooHigh

Available capacity cannot be fully utilized

For improved SSD capacity usage, contact the Customer Success Team for assistance.

SystemDefinedTLS

TLS certificate is not user-defined

Replace the auto-generated self-signed certificate with a user-defined certificate by running the command 'weka security tls set'.

TLSCertificateExpired

TLS certificate expired

Replace the existing certificate by running the command 'weka security tls set'.

TLSCertificateExpiresSoon

TLS certificate is about to expire

Replace the existing certificate by running the command 'weka security tls set'.

TieredFilesystemOverfillingSSD

Tiered filesystems' SSD capacity overfilling

Consider expanding the filesystem size or removing data and directories. Identify and resolve connectivity problems with the configured Object Store. Increase the upload bandwidth if required.

TraceDumperDown

Trace dumper is down

Contact the Customer Success Team to restart the trace dumper.

TracesDisabled

Traces are disabled

To turn the cluster traces, run the command 'weka debug traces start'. For more information, see the Traces management topic in the documentation.

TracesFreezePeriodActive

Freeze traces feature is active

If the problem persists after the case is resolved, contact the Customer Success Team.

UdpModePerformanceWarning

A backend container is configured in UDP mode

If this is a misconfiguration, add network devices to the specified backend container using the command ‘weka cluster container net add’.

UnwritableDisksConfigured

A drive is set to unwritable

If the drive remains unwritable after maintenance, contact the Customer Success Team.

PreviousManage alerts using the CLI NextEvents

Last updated 7 months ago