List of alerts and corrective actions
This page lists all the alerts generated by the WEKA system and possible actions to take.
Name
Description
Actions
AdminDefaultPassword
The admin password is still set to the factory default.
Change the admin user password to ensure only authorized users can access the cluster.
AgentNotRunning
The WEKA local control agent is not running on a server.
Restart the agent with service weka-agent start.
ApproachingClientsUnavailability
Approaching the maximum amount of clients that can connect with the current cluster resources.
Make sure all backend containers are up or expand the cluster with more backend containers or servers.
AutoRemoveTimeoutTooLow
Stateless Client auto-remove timeout too low.
Remount the with a higher auto-remove timeout value.
BackendNumaBalancingEnabled
A server has automatic NUMA balancing enabled, which can negatively impact performance.
To disable the NUMA balancing, on the backend server, run echo 0 > /proc/sys/kernel/numa_balancing
.
BackendVersionsMismatch
There are mismatching versions of backend containers in the cluster.
Upgrade all the backend containers to match the cluster's version.
BondInterfaceCompromised
The server is configured to work with a highly available network, but has lost the connectivity redundancy. A single network failure can disconnect the server from the cluster, which will result in the unavailability of data to the server (in case of a client) or data protection reduced redundancy (in case of a backend).
To resolve the issue, check the network configuration, cables, and NICs.
BucketHasNoQuorum
Too many compute processes are down, causing the bucket compute resource to be unavailable.
Check that the compute processes and their containers are up and running and fully connected. If the issue is not resolved, contact the Customer Success Team.
BucketUnresponsive
A compute resource has failed, causing system unavailability.
Check that the compute processes and their containers are up and running and fully connected. If the issue is not resolved, contact the Customer Success Team.
ChokingDetected
High congestion level detected in the cluster.
For more information, see System Congestion.
ClientNumaBalancingEnabled
A server has automatic NUMA balancing enabled which can negatively impact performance.
To disable the NUMA balancing, on the client, run echo 0 > /proc/sys/kernel/numa_balancing
.
ClientVersionsMismatch
There are clients with a version that does not match the cluster version. Some features may not be available until all the clients are upgraded.
Upgrade clients to be in the same version as the cluster by locally running weka local upgrade
.
ClockSkew
The clock of a server is skewed in relation to the cluster leader, with a time difference more than the permitted maximum of 30 seconds.
Make sure NTP is configured correctly on the servers and that their dates are synchronized.
CloudHealth
A server cannot upload events to the Weka cloud.
Check the server has Internet connectivity and is connected to the Weka cloud as explained in the Weka Support Cloud section.
CloudStatsError
Statistics upload to Weka cloud failed.
Check the server has Internet connectivity and is connected to the Weka cloud as explained in the Weka Support Cloud section.
ClusterInitializationError
The cluster has encountered an error while initializing.
Fix the underlying problem causing the error to successfully start IO operations.
ClusterIsUpgrading
Cluster is upgrading.
If the upgrade doesn't finish normally, contact the Weka support for assistance.
CPUFrequentStarvation
CPU frequent starvation detected in the last minute.
Check the relevant server logs for potential hardware problems or core allocation issues.
CPUStarvation
Weka processes are experiencing long CPU stalls.
Check the relevant server logs for potential hardware problems.
DataIntegrity
Data integrity issue found.
Contact the Weka support team.
DataProtection
Some of the system's data is not fully redundant.
Check which process, container, or drive is down and act accordingly.
DedicatedWatchdog
A dedicated Weka server requires the installation of a watchdog driver.
Make sure a watchdog is available at /dev/watchdog. For more information, search the Weka knowledge-base in the Weka support portal.
DriveCriticalWarnings
Drive critical warnings.
Deactivate the drive using the command weka cluster drive deactivate
and replace it.
DriveDown
A drive is not responding.
Contact the Customer Success Team to check if the drive requires replacement.
DriveEndurancePercentageUsed
Drive exceeding its life expectancy.
It is recommended to replace the drive before it fails.
DriveEnduranceSparesRemaining
Drive internal spares running too low.
It is recommended to replace the drive before it fails.
DriveNeedsPhaseout
A drive has too many errors.
Deactivate the drive and probably replace it.
FilesystemHasTooManyFiles
The filesystem storage configuration for the size of file and directory entries is exceeding (or about to exceed).
Increase the max-files for the filesystem.
FilesystemSquashPending
A filesystem squash task is pending.
The filesystem is pending squash. The squash background task begins automatically. No corrective action is required.
FilesystemsThinProvisioningLowSpace
There are thinly provisioned filesystems that running on low free capacity.
Consider adding more SSD capacity to the organization containing these filesystems'.
FilesystemsThinProvisioningReserveReached
The request reserved capacity (for filesystem creation/expansion) is available.
The reserved capacity can now be used for filesystems creation/expansion.
HangingCacheSync
Cache sync is stopped
A stopped cache sync can prevent other clients from accessing some files. To resolve this issue, reboot the server or remove it from the cluster. Data that is not synced with the cluster may be lost.
HangingIOs
Some IOs are hanging on the container acting as a driver/NFS/backend.
Check that the compute processes and their containers are up and running, and fully connected. Also check that if a backend object store is configured, it is connected and responsive. Contact the Weka Support Team if the issue is not resolved.
HighDrivesCapacity
The average capacity of the SSDs is too high.
Free-up space on the SSDs or add more SSDs to the cluster. To add SSDs, see Expansion of specific resources.
HighLevelOfUnreclaimedCapacityInObjectStore
High level of unreclaimed space in object store.
Check object store connectivity and deletion operations' progress. Validate authorization of deletion operations on the object store. Run weka fs tier capacity
for details.
JumboConnectivity
A server cannot send jumbo frames to any of its cluster peers.
Check the server network settings and the switch to which it is connected, even if Weka seems to be functional since this will improve performance.
KmsError
KMS Error
Review the KMS credentials, permissions, and configuration, as suggested in KMS management.
LicenseError
A license conflict exists.
Make sure the cluster is using the correct license, the license has not expired, and the cluster allocated space does not exceed the license.
LowDiskSpace
The server has low disk space (for /opt/weka
directory) which can affect some weka reporting services.
Free up space on the server, or contact the Weka Support Team.
ManualOverridesActive
Manual overrides are active.
Please contact the Weka Support Team.
MismatchedDriveFailureDomain
The drive failure domain does not match the failure domain of its attached container.
Either connect the mismatched drive to a container with a matching failure domain, or re-provision the drive to erase its failure domain.
NegativeUnprovisionedCapacity
Weka capacity usage changes detected due to cluster upgrade.
One or more of the filesystems need to be resized in order to reclaim capacity. Contact the Weka Support Team.
NetworkInterfaceLinkDown
A Network interface has a link down status.
Check the connectivity to the down interface and see if there is anything blocking it.
NoClusterLicense
No license is assigned to the cluster.
Obtain and install a license from get.weka.io.
NodeBlacklisted
There is a blacklisted process in the cluster.
Use weka debug blacklist disable
to whitelist processes so they can rejoin the cluster.
NodeDisconnected
A process is disconnected from the cluster.
Check network connectivity to make sure the processes can communicate with the cluster.
NodeNetworkUnstable
A process seems to have an unstable network. As a consequence, it has been fenced by the system and does not contribute resources to the Weka cluster.
Make sure there is no network connectivity issue in the cluster. Contact the Weka Support Team if the issue is not resolved.
NodeRDMANotActive
RDMA is supported on the server but it is inactive.
Make sure Mellanox OFED version 4.6 or higher is properly installed on the server and there is at least one RDMA capable device.
NodeTieringConnectivity
A process cannot connect to an object-store.
Check connectivity with the object store and make sure the server can communicate with it.
NotEnoughActiveDrives
Reduced data protection.
Check connectivity, and server status. Replace problematic drives and expand the cluster with new failure domains.
PartialConnectivityTrackingDisabled
The cluster's partial connectivity tracking mechanism is disabled, affecting the cluster's self-healing capabilities.
Contact the Customer Success Team.
PartiallyConnectedNode
A process seems to be only partially connected.
Make sure there is no network connectivity issue. If the issue is not resolved, contact the Customer Success Team.
PassedClientsAvailabilityThreshold
Reached Clients Limit
Add more backend containers or servers to the cluster, check whether backends are down, or disconnect some clients.
PerformanceDegradedLowRAM
The server is running low on RAM. Additional Metadata entries are swapped to the SSD. This might impact performance.
Make sure all the compute processes are up. Add more servers to the Weka cluster, or the configured RAM of the cluster backend servers.
QuotasHardLimitReached
There are directory quotas that have reached their hard limit.
Run weka fs quota list
to see which directory quotas have reached their hard limit.
QuotasSoftLimitReached
There are directory quotas that have reached their soft limit.
Run weka fs quota list
to see which directory quotas have reached their soft limit.
ResourcesNotApplied
There are changes to containers resources that are not applied in the Weka cluster.
To apply changes run weka cluster container apply <container_id>
SSDCapacityDiscrepancy
Used SSD capacity mismatches the expected range
Monitor COMPUTE processes' stability, contact the Weka Support Team.
SystemDefinedTLS
The Weka cluster uses an auto-generated self-signed certificate.
Run weka security tls set
to replace the auto-generated certificate with your own certificate for cluster TLS use.
TLSCertificateExpired
TLS Certificate has expired.
Replace the current certificate using weka security tls set
TLSCertificateExpiresSoon
TLS Certificate is about to expire.
Replace the current certificate using weka security tls set
TieredFilesystemOverfillingSSD
Tiered filesystems' SSD Capacity overfilling.
Resolve tiering connectivity issues or increase the upload bandwidth.
TraceDumperDown
Trace dumper is down
Contact the Weka Support Team to restart the trace dumper.
TracesDisabled
Traces are disabled.
To turn them back on contact the Weka Support Team.
TracesFreezePeriodActive
A trace freeze period is active.
Some traces can be protected from rotating for a period of time to debug the system. This is done by the Weka Support Team when needed. If the issue persists after the case has been resolved please contact the Weka Support Team.
UdpModePerformanceWarning
The backend server is configured in UDP mode.
If this is a misconfiguration use weka cluster container net add
to add network devices to this backend server.
UnwritableDisksConfigured
A drive is set to unwritable.
If the drive remains unwritable after maintenance completes, contact the Customer Success Team.
Last updated