List of alerts and corrective actions
Check WEKA system alerts and take necessary actions based on severity and nature.
AdminDefaultPassword
Default admin password in use
Change the admin user password to ensure only authorized users can access the cluster.
INFO
AgentNotRunning
The local agent does not run
Restart the local agent on the specified server using the command ‘service weka-agent start’.
DEBUG
ApproachingClientsUnavailability
Approaching connected clients limit
Ensure all backend containers are up or expand the cluster with more backend containers or servers.
DEBUG
ApproachingSystemLimit
Approaching a system limit
See the required action details in the System Alerts.
MAJOR
AutoRemoveTimeoutTooLow
Stateless Client auto-remove timeout too low
Remount the host with a higher auto-remove timeout value.
WARNING
AvailableMemory
Not enough available memory
Check your system.
MAJOR
BackendNumaBalancingEnabled
NUMA balancing is enabled on a backend server
Disable the automatic NUMA balancing by running the command line 'echo 0 > /proc/sys/kernel/numa_balancing' on the backend server.
WARNING
BackendVersionsMismatch
Backends mismatch cluster version
Upgrade all the backends to match the cluster's version
WARNING
BadDisksCapacityRatio
Bad ratio between smallest and biggest drive
Replace drives so that there will be no such big difference in drives' sizes
MAJOR
BlockedJrpcMethod
JRPC method is blocked
Unblock the JRPC method by running the command 'blocked_jrpc_methods_remove' or 'blocked_jrpc_methods_clear' manhole.
DEBUG
BondInterfaceCompromised
Network high availability interface compromised
Ensure a proper operation of the network configuration, cables, and NICs.
MINOR
BucketCapacityExhausting
Buckets are nearing exhaustion of their maximum capacity
Consider migration to a cluster with a higher number of buckets
DEBUG
BucketHasNoQuorum
Too many compute processes are down
Ensure the compute processes on the containers {hosts} are up and running and connected. If the issue is not resolved, contact the Customer Success Team.
DEBUG
BucketUnresponsive
Compute resource failure
Check the connectivity and status of the drives of the leader container. and ensure the compute processes are running and connected. If the issue is not resolved, contact the Customer Success Team.
CRITICAL
CPUFrequentStarvation
CPU frequent starvation detected in the last minute
Check the logs of the relevant containers for potential hardware or core allocation problems.
DEBUG
CPUStarvation
CPU starvation detected in the last minute
Check the logs of the relevant containers for potential hardware problems. For specific hang addresses run /weka/weka_addr2line within the reported weka container on the address to convert them into a symbol name.
DEBUG
CWTaskAbortionStuck
CWTask abortion stuck
Start IO to allow the task to complete aborting
DEBUG
ChokingDetected
High congestion level
For more information, see the System congestion topic in the documentation.
DEBUG
ClientVersionsMismatch
Clients mismatch cluster version
Upgrade the clients to the same version as the cluster by running 'weka local upgrade' locally.
INFO
ClockSkew
Clock skew on server
Ensure the NTP is configured correctly on the containers and that their clocks are synchronized.
MINOR
CloudHealth
Weka Home disconnected
Check that the server has Internet connectivity and is connected to the Weka Home. See the Weka Home - The Weka support cloud topic in the documentation.
MINOR
CloudStatsError
Statistics upload failed
See the event details in the System Events.
DEBUG
ClusterInitializationError
Cluster initialization error
Search for the underlying problem causing the error and act accordingly to start IO operations. To clear this alert, run 'weka cluster stop-io'.
INFO
ClusterIsUpgrading
Cluster is upgrading
If the upgrade doesn't finish successfully, contact the Customer Success Team.
DEBUG
CoreOverlapping
Core Overlapping
Contact the Customer Success Team.
MAJOR
DataIntegrity
Data integrity problem found
Contact the Customer Success Team.
CRITICAL
DataProtection
Partial data protection
Check which process, container, or drive is down and act accordingly.
MINOR
DedicatedWatchdog
A dedicated server requires the installation of a hardware watchdog driver.
Ensure a hardware watchdog driver is available at /dev/watchdog. For details, search the Knowledge Base in the Weka support portal.
DEBUG
DrainingStuck
Stuck draining
Check the host status and logs for more information.
MINOR
DriveCriticalWarnings
Drive critical warnings
Deactivate the drive using the command 'weka cluster drive deactivate' and replace it.
MAJOR
DriveDown
Drive down
Contact the Customer Success Team to check if the drive requires a replacement.
MINOR
DriveEndurancePercentageUsed
Drive exceeds its life expectancy
Replace the specified drive before it fails.
MAJOR
DriveEnduranceSparesRemaining
Drive internal spares run too low
Replace the specified drive before it fails.
MAJOR
DriveNVKVRunningLow
Drive nearing exhaustion of internal resource
Contact the Customer Success Team.
DEBUG
DriveNeedsPhaseout
A drive has too many errors
Deactivate the drive using the command 'weka cluster drive deactivate', and probably replace it.
MAJOR
ExampleAlert
Example Alert
Disable this alert by running the set_example_alert_off manhole
DEBUG
ExceptionsDuringAlertsEvaluation
Exceptions thrown during alerts evaluation
Check Assertion failures event that may reveal the source of the problem. Contact the Customer Success team if help is required.
DEBUG
FaultsEnabled
Faults are enabled
Contact the Customer Success Team.
DEBUG
FilesystemHasTooManyFiles
Insufficient SSD capacity for metadata on filesystem
To address this issue, consider expanding the filesystem size or removing data and directories. If you have previously configured max-files settings, contact the Customer Success Team for assistance.
MAJOR
FilesystemKMSError
Filesystem KMS Error
Review the filesystem's KMS customization and the KMS configuration and connectivity.
DEBUG
FilesystemsThinProvisioningLowSpace
Filesystems thin provisioning low space
Consider adding SSD capacity to this organization containing these filesystems.
WARNING
FilesystemsThinProvisioningReserveReached
Filesystems thin provisioning capacity reserve reached
You can create a filesystem or expand the filesystem capacity using the reserved capacity.
DEBUG
HangingCacheSync
Cache sync is hanging
Consider using weka debug fs drop-dirty-cache
to drop the cache and enable other clients to access the file (unsynchronized writes will be lost).
MINOR
HangingClusterTasks
Cluster background task progress is hanging
Contact the Customer Success Team.
DEBUG
HangingIos
Some IOs stop responding
Ensure the compute processes are up and running and connected. If a backend object store is configured, ensure it is connected and responsive. If the issue is not resolved, contact the Customer Success Team.
DEBUG
HighDrivesCapacity
SSD capacity overflow
Free up space on the SSDs or add more SSDs to the cluster. To add SSDs, see the Exapnd specific resources of a container topic in the documentation.
MAJOR
HighLevelOfUnreclaimedCapacityInObjectStore
High level of unreclaimed space in an object store
DEBUG
HighSSDToRAMRatio
High SSD to RAM Ratio
Consider increasing RAM cluster-wide or removing unneeded drives to ensure Filesystems(RAID) requirements and lower SSD to RAM ratio
DEBUG
HotspotInodes
Some files have a long waiting queue for IOs
Contact the Customer Success Team to help with resolution.
DEBUG
IBNotEnhanced
Enhanced IB mode disabled
Contact Customer Success to correct this issue
DEBUG
ImbalancedCpuUsage
Imbalanced CPU usage detected in cluster processes
Check system configuration: Examine the system configuration for abnormalities that may be causing the CPU usage imbalance.
DEBUG
JumboConnectivity
A container cannot send jumbo frames
Check the container network settings and the switch to which the container is connected, and ensure to enable jumbo frames. This setting improves performance.
WARNING
KMSError
KMS Error
Review the KMS configuration and connectivity.
MAJOR
LeaderPreparedForUpgrade
Leader prepared for upgrade
After the upgrade, the leader state automatically returns to normal. If this alert persists, contact the Customer Success Team.
DEBUG
LegacyManualOverridesActive
Legacy manual overrides are active
Contact the Customer Success Team.
DEBUG
LicenseError
License error
Ensure the cluster uses the correct license, the license has not expired, and the allocated space does not exceed the license limits.
WARNING
LocalTLSCertificateExpired
Local TLS certificate expired
Update the local certificate.
DEBUG
LocalTLSCertificateExpiringSoon
Local TLS certificate is expiring soon
Update the local certificate.
DEBUG
LocalTLSConnectivityToNeighbors
Outgoing TLS connectivity to backends is down
Fix the TLS issue. One possibility is errors in the local CA certificate in /etc/wekaio/certs
DEBUG
LowDiskSpace
Low disk space
See the event details in the System Events.
MINOR
ManualOverridesActive
Manual overrides are active
Contact the Customer Success Team.
DEBUG
ManualOverridesForced
Manual overrides are forced
Contact the Customer Success Team.
DEBUG
MismatchedDriveFailureDomain
A drive failure domain does not match the failure domain of its attached container
Do one of the following: a) Connect the mismatched drive to a container with a matching failure domain. b) Re-provision the drive to erase its failure domain.
MAJOR
MismatchedJoinSecrets
Backend containers do not have the same join secrets
This may create problems rejoining or reforming the cluster. Make sure all backend containers have the same join secret.
DEBUG
NegativeUnprovisionedCapacity
Negative unprovisioned capacity
Resize one or more of the filesystems to reclaim capacity. For more information, contact the Customer Success Team.
DEBUG
NetworkFailedToStartPorts
Network ports failed to start
Run weka debug net ports $NODE
to see the current status.
DEBUG
NetworkInterfaceLinkDown
Network interface link status down
Check the connectivity to the specified network interface. Verify that nothing blocks it.
MINOR
NfsLocksDisabled
NFS Locks disabled
Configure config fs using weka nfs global-config set --config-fs=.
INFO
NfsServiceDownAlert
NFS Service Down
If down services persist, contact the Customer Success Team.
MAJOR
NoCgroupsConfigured
No cgroups configured warnings
Disabled or improperly configured Cgroups can cause system instability and performance degradation. Enable and configure Cgroups (v1/v2) following the Cgroups configuration section at https://docs.weka.io.
WARNING
NoClusterLicense
No license assigned
Obtain and install a license from get.weka.io.
WARNING
NodeBlacklisted
A process cannot rejoin the cluster
To enable the process to rejoin the cluster, whitelist it by running the command ‘weka debug blacklist disable’.
DEBUG
NodeDisconnected
Process disconnected
Check network connectivity to ensure the processes can communicate with the cluster.
MINOR
NodeNetworkUnstable
A process with an unstable network detected
Ensure proper network connectivity in the cluster. If the problem is not resolved, contact the Customer Success Team.
WARNING
NodeRDMANotActive
RDMA support for process is Inactive
Ensure that at least one RDMA-capable device exists.
DEBUG
NodeTieringConnectivity
A process cannot Connect to an object store
Check the connectivity with the object store and ensure the process communicates with it.
MAJOR
NonTlsApisAllowed
Non-TLS APIs are allowed
Update TLS strictness to enforce encrypted TLS APIs over HTTP.
DEBUG
NotEnoughActiveDrives
Reduced data protection
Check the connectivity and server status. Replace failed drives and expand the cluster with new failure domains.
CRITICAL
NotEnoughMemoryForFilesystemOperation
Insufficient cluster-wide RAM for proper Filesystem's Operation
Increase RAM cluster-wide to meet Filesystems (RAID) requirements for RAM or remove drives contributing to SSD capacity.
DEBUG
NotEnoughSSDCapacity
Some provisioned capacity is unavailable due to failed drives
Check for down drives.
MAJOR
PartialConnectivityTrackingDisabled
Partial connectivity tracking is disabled
To turn on the Grim Reaper, contact the Customer Success Team.
DEBUG
PartialHugepageAllocation
Not enough memory
Check your system.
MAJOR
PartiallyConnectedNode
A partially connected process detected
Ensure proper network connectivity in the cluster. If the problem is not resolved, contact the Customer Success Team.
MINOR
PassedClientsAvailabilityThreshold
Reached connected clients limit
Add more backend containers or servers to the cluster, check whether the backends are down, or disconnect some clients.
DEBUG
PathsDegraded
Degraded Paths
Contact the Customer Success Team to review path connectivity.
MINOR
PerformanceDegradedLowRAM
Server low RAM
Ensure all the compute processes are up. Add more servers to the cluster or add RAM to the backend servers.
MAJOR
QuotasHardLimitReached
Directory quota hard limit exceeded
Run 'weka fs quota list' to get the list of directories exceeding their hard quota limits. Clear some space for these directories or increase their hard quota limit.
WARNING
QuotasSoftLimitReached
Directory quota soft limit exceeded
Run 'weka fs quota list' to get the list of directories exceeding their soft quota limits. Clear some space for these directories or increase their soft quota limit.
INFO
RAIDCapacityExhaustion
RAID capacity exhaustion
If the situation is not resolved within minutes, contact the Customer Success Team.
MAJOR
RequestedActionFailure
Requested action failure
Check the logs for more information.
DEBUG
ResourcesNotApplied
Resource changes are not applied
Apply the resource changes by running the command 'weka cluster container apply '.
DEBUG
SSDCapacityDiscrepancy
Used SSD capacity mismatches the expected range
Monitor the compute processes' stability and contact the Customer Success Team.
DEBUG
SSDCapacityTooHigh
Available capacity cannot be fully utilized
For improved SSD capacity usage, contact the Customer Success Team for assistance.
INFO
SystemDefinedTLS
TLS certificate is not user-defined
Replace the auto-generated self-signed certificate with a user-defined certificate by running the command 'weka security tls set'.
INFO
TLSCertificateExpired
TLS certificate expired
Replace the existing certificate by running the command 'weka security tls set'.
MAJOR
TLSCertificateExpiresSoon
TLS certificate is about to expire
Replace the existing certificate by running the command 'weka security tls set'.
MAJOR
TelemetryStatusFault
Telemetry status is not streaming
Check your telemetry sinks configuration.
DEBUG
TieredFilesystemOverfillingSSD
Tiered filesystems' SSD capacity overfilling
To address this issue, consider expanding the filesystem size or removing data and directories. Identify and resolve connectivity problems with the configured Object Store and increase the upload bandwidth if required.
WARNING
TooManyPendingClusterwideJobs
Too many pending cluster-wide jobs
Consider changing the policy configuration.
DEBUG
TraceDumperDown
Trace dumper is down
Contact the Customer Success Team to restart the trace dumper.'
DEBUG
TracesDisabled
Traces are disabled
To turn the cluster traces, run the command 'weka debug traces start'. For more information, see the Traces management topic in the documentation.
DEBUG
TracesFreezePeriodActive
Freeze traces is active
If the problem persists after the case is resolved, contact the Customer Success Team.
DEBUG
UdpModePerformanceWarning
A backend container is configured in UDP mode
If this is a misconfiguration, add network devices to the specified backend container using the command ‘weka cluster container net add’.
DEBUG
UnstableHosts
Host unstable during upgrade
Check the host status and logs for more information.
DEBUG
UnwritableDisksConfigured
A drive is set to unwritable
If the drive remains unwritable after maintenance, contact the Customer Success Team.
DEBUG