List of alerts and corrective actions
Check WEKA system alerts and take necessary actions based on severity and nature.
Alert name | Description | Corrective actions |
---|---|---|
AdminDefaultPassword | Default admin password in use | Change the admin user password to ensure only authorized users can access the cluster. |
AgentNotRunning | The local agent does not run | Restart the local agent on the specified server using the command ‘service weka-agent start’. |
ApproachingClientsUnavailability | Approaching connected clients limit | Ensure all backend containers are up or expand the cluster with more backend containers or servers. |
AutoRemoveTimeoutTooLow | Stateless Client auto-remove timeout too low | Remount the host with a higher auto-remove timeout value. |
BackendNumaBalancingEnabled | NUMA balancing is enabled on a backend server | Disable the automatic NUMA balancing by running the command line 'echo 0 > /proc/sys/kernel/numa_balancing' on the backend server. |
BackendVersionsMismatch | Backends mismatch cluster version | Upgrade all the backends to match the cluster's version. |
BlockedJrpcMethod | JRPC method is blocked | Unblock the JRPC method by running the command 'blocked_jrpc_methods_remove' or 'blocked_jrpc_methods_clear' manhole. |
BondInterfaceCompromised | Network high availability interface compromised | Ensure a proper operation of the network configuration, cables, and NICs. |
BucketCapacityExhausting | Buckets are nearing exhaustion of their maximum capacity | Consider migration to a cluster with more buckets. |
BucketHasNoQuorum | Too many compute processes are down | Ensure the compute processes on the containers {hosts} are up and running and connected. If the issue is not resolved, contact the Customer Success Team. |
BucketUnresponsive | Compute resource failure | Check the connectivity and status of the drives of the container {leader_name} ({leader_nid}, {leader_hid}) and ensure the compute processes are running and connected. If the issue is not resolved, contact the Customer Success Team. |
CPUFrequentStarvation | CPU frequent starvation is detected at the last minute | Check the logs of the relevant containers for potential hardware or core allocation problems. |
CPUStarvation | CPU starvation was detected at the last minute | Check the logs of the relevant containers for potential hardware problems. |
ChokingDetected | High congestion level | For more information, see the System congestion topic in the documentation. |
ClientNumaBalancingEnabled | NUMA balancing is enabled on a client | Disable the automatic NUMA balancing by running the command line 'echo 0 > /proc/sys/kernel/numa_balancing' on the client. |
ClientVersionsMismatch | Clients mismatch cluster version | Upgrade the clients to the same version as the cluster by running 'weka local upgrade' locally. |
ClockSkew | Clock skew on server | Ensure the NTP is configured correctly on the containers and that their clocks are synchronized. |
CloudHealth | Weka Home disconnected | Check that the server has Internet connectivity and is connected to the Weka Home. See the Weka Home - The Weka support cloud topic in the documentation. |
CloudStatsError | Statistics upload failed | See the event details in the System Events. |
ClusterInitializationError | Cluster initialization error | Search for the underlying problem causing the error and act accordingly to start IO operations. To clear this alert, run 'weka cluster stop-io'. |
ClusterIsUpgrading | Cluster is upgrading | If the upgrade doesn't finish successfully, contact the Customer Success Team. |
CoreOverlapping | Core Overlapping | Contact the Customer Success Team. |
DataIntegrity | Data integrity problem found | Contact the Customer Success Team. |
DataProtection | Partial data protection | Check which process, container, or drive is down and act accordingly. |
DedicatedWatchdog | A dedicated server requires the installation of a hardware watchdog driver. | Ensure a hardware watchdog driver is available at /dev/watchdog. For details, search the Knowledge Base in the Weka support portal. |
DriveCriticalWarnings | Drive critical warnings | Deactivate the drive using the command 'weka cluster drive deactivate' and replace it. |
DriveDown | Drive down | Contact the Customer Success Team to check if the drive requires a replacement. |
DriveEndurancePercentageUsed | Drive exceeds its life expectancy | Replace the specified drive before it fails. |
DriveEnduranceSparesRemaining | Drive internal spares run too low | Replace the specified drive before it fails. |
DriveNVKVRunningLow | Drive nearing exhaustion of internal resources | Contact the Customer Success Team. |
DriveNeedsPhaseout | A drive has too many errors | Deactivate the drive using the command 'weka cluster drive deactivate', and probably replace it. |
ExampleAlert | Example Alert | Disable this alert by running the set_example_alert_off manhole. |
FaultsEnabled | Faults are enabled | Contact the Customer Success Team. |
FilesystemHasTooManyFiles | Too many files in a filesystem | Increase the filesystem 'max-files' value. If required, decrease the 'max-files' value of another filesystem or expand the memory. |
FilesystemsThinProvisioningLowSpace | Filesystems thin provisioning low space | Consider adding SSD capacity to this organization containing these filesystems. |
FilesystemsThinProvisioningReserveReached | Filesystems thin provisioning capacity reserve reached | You can create a filesystem or expand the filesystem capacity using the reserved capacity. |
HangingCacheSync | Cache sync is stopped | Reboot the server or remove it from the cluster. |
HangingIos | Some IOs stop responding | Ensure the compute processes are up and running and connected. If a backend object store is configured, ensure it is connected and responsive. If the issue is not resolved, contact the Customer Success Team. |
HighDrivesCapacity | SSD capacity overflow | Free up space on the SSDs or add more SSDs to the cluster. To add SSDs, see the Exapnd specific resources of a container topic in the documentation. |
HighLevelOfUnreclaimedCapacityInObjectStore | High level of unreclaimed space in an object store. | |
JumboConnectivity | A container cannot send jumbo frames | Check the container network settings and the switch to which the container is connected, and ensure to enable jumbo frames. This setting improves performance. |
KMSError | KMS Error | Review the KMS configuration and connectivity. |
LeaderPreparedForUpgrade | Leader prepared for upgrade | After the upgrade, the leader state automatically returns to normal. If this alert persists, contact the Customer Success Team. |
LegacyManualOverridesActive | Legacy manual overrides are active | Contact the Customer Success Team. |
LicenseError | License error | Ensure the cluster uses the correct license, the license has not expired, and the allocated space does not exceed the license limits. |
LowDiskSpace | Low disk space | See the event details in the System Events. |
ManualOverridesActive | Manual overrides are active | Contact the Customer Success Team. |
ManualOverridesForced | Manual overrides are forced | Contact the Customer Success Team. |
MismatchedDriveFailureDomain | A drive failure domain does not match the failure domain of its attached container | Do one of the following: a) Connect the mismatched drive to a container with a matching failure domain. b) Re-provision the drive to erase its failure domain. |
NegativeUnprovisionedCapacity | Negative unprovisioned capacity | Resize one or more of the filesystems to reclaim capacity. For more information, contact the Customer Success Team. |
NetworkInterfaceLinkDown | Network interface link status down | Check the connectivity to the specified network interface. Verify that nothing blocks it. |
NoClusterLicense | No license assigned | Obtain and install a license from get.weka.io. |
NodeBlacklisted | A process cannot rejoin the cluster | To enable the process to rejoin the cluster, whitelist it by running the command ‘weka debug blacklist disable’. |
NodeDisconnected | Process disconnected | Check network connectivity to ensure the processes can communicate with the cluster. |
NodeNetworkUnstable | A process with an unstable network detected | Ensure proper network connectivity in the cluster. If the problem is not resolved, contact the Customer Success Team. |
vNodeRDMANotActive | PA process with supported RDMA is Inactive | Ensure Mellanox OFED version 4.6 or later is installed on the server and at least one RDMA-capable device exists. |
NodeTieringConnectivity | A process cannot Connect to an object store | Check the connectivity with the object store and ensure the process communicates with it. |
NotEnoughActiveDrives | Reduced data protection | Check the connectivity and server status. Replace failed drives and expand the cluster with new failure domains. |
PartialConnectivityTrackingDisabled | Partial connectivity tracking is disabled | To turn on the Grim Reaper, contact the Customer Success Team. |
PartiallyConnectedNode | A partially connected process detected | Ensure proper network connectivity in the cluster. If the problem is not resolved, contact the Customer Success Team. |
PassedClientsAvailabilityThreshold | Reached connected clients limit | Add more backend containers or servers to the cluster, check whether the backends are down, or disconnect some clients. |
PerformanceDegradedLowRAM | Server low RAM | Ensure all the compute processes are up. Add more servers to the cluster or add RAM to the backend servers. |
QuotasHardLimitReached | Directory quota hard limit exceeded | Run 'weka fs quota list' to get the list of directories exceeding their hard quota limits. Clear some space for these directories or increase their hard quota limit. |
QuotasSoftLimitReached | Directory quota soft limit exceeded | Run 'weka fs quota list' to get the list of directories exceeding their soft quota limits. Clear some space for these directories or increase their soft quota limit. |
RAIDCapacityExhaustion | RAID capacity exhaustion | If the situation is not resolved within minutes, contact the Customer Success Team. |
ResourcesNotApplied | Resource changes are not applied | Apply the resource changes by running the command 'weka cluster container apply '. |
S3EtcdMigrationAlert | S3 etcd migration | Contact the Customer Success Team to migrate this cluster configuration storage from ETCD to the new built-in Weka solution |
SSDCapacityDiscrepancy | Used SSD capacity mismatches the expected range | Monitor the compute processes' stability and contact the Customer Success Team. |
SSDCapacityTooHigh | Available capacity cannot be fully utilized | For improved SSD capacity usage, contact the Customer Success Team for assistance. |
SystemDefinedTLS | TLS certificate is not user-defined | Replace the auto-generated self-signed certificate with a user-defined certificate by running the command 'weka security tls set'. |
TLSCertificateExpired | TLS certificate expired | Replace the existing certificate by running the command 'weka security tls set'. |
TLSCertificateExpiresSoon | TLS certificate is about to expire | Replace the existing certificate by running the command 'weka security tls set'. |
TieredFilesystemOverfillingSSD | Tiered filesystems' SSD capacity overfilling | Consider expanding the filesystem size or removing data and directories. Identify and resolve connectivity problems with the configured Object Store. Increase the upload bandwidth if required. |
TraceDumperDown | Trace dumper is down | Contact the Customer Success Team to restart the trace dumper. |
TracesDisabled | Traces are disabled | To turn the cluster traces, run the command 'weka debug traces start'. For more information, see the Traces management topic in the documentation. |
TracesFreezePeriodActive | Freeze traces feature is active | If the problem persists after the case is resolved, contact the Customer Success Team. |
UdpModePerformanceWarning | A backend container is configured in UDP mode | If this is a misconfiguration, add network devices to the specified backend container using the command ‘weka cluster container net add’. |
UnwritableDisksConfigured | A drive is set to unwritable | If the drive remains unwritable after maintenance, contact the Customer Success Team. |
Last updated