List of alerts and corrective actions

Check WEKA system alerts and take necessary actions based on severity and nature.

Alert name
Description
Corrective actions
Severity

AdminDefaultPassword

Default admin password in use

Change the admin user password to ensure only authorized users can access the cluster.

INFO

AgentNotRunning

The local agent does not run

Restart the local agent on the specified server using the command ‘service weka-agent start’.

DEBUG

ApproachingClientsUnavailability

Approaching connected clients limit

Ensure all backend containers are up or expand the cluster with more backend containers or servers.

DEBUG

ApproachingSystemLimit

Approaching a system limit

See the required action details in the System Alerts.

MAJOR

AutoRemoveTimeoutTooLow

Stateless Client auto-remove timeout too low

Remount the host with a higher auto-remove timeout value.

WARNING

AvailableMemory

Not enough available memory

Check your system.

MAJOR

BackendNumaBalancingEnabled

NUMA balancing is enabled on a backend server

Disable the automatic NUMA balancing by running the command line 'echo 0 > /proc/sys/kernel/numa_balancing' on the backend server.

WARNING

BackendVersionsMismatch

Backends mismatch cluster version

Upgrade all the backends to match the cluster's version

WARNING

BadDisksCapacityRatio

Bad ratio between smallest and biggest drive

Replace drives so that there will be no such big difference in drives' sizes

MAJOR

BlockedJrpcMethod

JRPC method is blocked

Unblock the JRPC method by running the command 'blocked_jrpc_methods_remove' or 'blocked_jrpc_methods_clear' manhole.

DEBUG

BondInterfaceCompromised

Network high availability interface compromised

Ensure a proper operation of the network configuration, cables, and NICs.

MINOR

BucketCapacityExhausting

Buckets are nearing exhaustion of their maximum capacity

Consider migration to a cluster with a higher number of buckets

DEBUG

BucketHasNoQuorum

Too many compute processes are down

Ensure the compute processes on the containers {hosts} are up and running and connected. If the issue is not resolved, contact the Customer Success Team.

DEBUG

BucketUnresponsive

Compute resource failure

Check the connectivity and status of the drives of the leader container. and ensure the compute processes are running and connected. If the issue is not resolved, contact the Customer Success Team.

CRITICAL

CPUFrequentStarvation

CPU frequent starvation detected in the last minute

Check the logs of the relevant containers for potential hardware or core allocation problems.

DEBUG

CPUStarvation

CPU starvation detected in the last minute

Check the logs of the relevant containers for potential hardware problems. For specific hang addresses run /weka/weka_addr2line within the reported weka container on the address to convert them into a symbol name.

DEBUG

CWTaskAbortionStuck

CWTask abortion stuck

Start IO to allow the task to complete aborting

DEBUG

ChokingDetected

High congestion level

For more information, see the System congestion topic in the documentation.

DEBUG

ClientVersionsMismatch

Clients mismatch cluster version

Upgrade the clients to the same version as the cluster by running 'weka local upgrade' locally.

INFO

ClockSkew

Clock skew on server

Ensure the NTP is configured correctly on the containers and that their clocks are synchronized.

MINOR

CloudHealth

Weka Home disconnected

Check that the server has Internet connectivity and is connected to the Weka Home. See the Weka Home - The Weka support cloud topic in the documentation.

MINOR

CloudStatsError

Statistics upload failed

See the event details in the System Events.

DEBUG

ClusterInitializationError

Cluster initialization error

Search for the underlying problem causing the error and act accordingly to start IO operations. To clear this alert, run 'weka cluster stop-io'.

INFO

ClusterIsUpgrading

Cluster is upgrading

If the upgrade doesn't finish successfully, contact the Customer Success Team.

DEBUG

CoreOverlapping

Core Overlapping

Contact the Customer Success Team.

MAJOR

DataIntegrity

Data integrity problem found

Contact the Customer Success Team.

CRITICAL

DataProtection

Partial data protection

Check which process, container, or drive is down and act accordingly.

MINOR

DedicatedWatchdog

A dedicated server requires the installation of a hardware watchdog driver.

Ensure a hardware watchdog driver is available at /dev/watchdog. For details, search the Knowledge Base in the Weka support portal.

DEBUG

DrainingStuck

Stuck draining

Check the host status and logs for more information.

MINOR

DriveCriticalWarnings

Drive critical warnings

Deactivate the drive using the command 'weka cluster drive deactivate' and replace it.

MAJOR

DriveDown

Drive down

Contact the Customer Success Team to check if the drive requires a replacement.

MINOR

DriveEndurancePercentageUsed

Drive exceeds its life expectancy

Replace the specified drive before it fails.

MAJOR

DriveEnduranceSparesRemaining

Drive internal spares run too low

Replace the specified drive before it fails.

MAJOR

DriveNVKVRunningLow

Drive nearing exhaustion of internal resource

Contact the Customer Success Team.

DEBUG

DriveNeedsPhaseout

A drive has too many errors

Deactivate the drive using the command 'weka cluster drive deactivate', and probably replace it.

MAJOR

ExampleAlert

Example Alert

Disable this alert by running the set_example_alert_off manhole

DEBUG

ExceptionsDuringAlertsEvaluation

Exceptions thrown during alerts evaluation

Check Assertion failures event that may reveal the source of the problem. Contact the Customer Success team if help is required.

DEBUG

FaultsEnabled

Faults are enabled

Contact the Customer Success Team.

DEBUG

FilesystemHasTooManyFiles

Insufficient SSD capacity for metadata on filesystem

To address this issue, consider expanding the filesystem size or removing data and directories. If you have previously configured max-files settings, contact the Customer Success Team for assistance.

MAJOR

FilesystemKMSError

Filesystem KMS Error

Review the filesystem's KMS customization and the KMS configuration and connectivity.

DEBUG

FilesystemsThinProvisioningLowSpace

Filesystems thin provisioning low space

Consider adding SSD capacity to this organization containing these filesystems.

WARNING

FilesystemsThinProvisioningReserveReached

Filesystems thin provisioning capacity reserve reached

You can create a filesystem or expand the filesystem capacity using the reserved capacity.

DEBUG

HangingCacheSync

Cache sync is hanging

Consider using weka debug fs drop-dirty-cache to drop the cache and enable other clients to access the file (unsynchronized writes will be lost).

MINOR

HangingClusterTasks

Cluster background task progress is hanging

Contact the Customer Success Team.

DEBUG

HangingIos

Some IOs stop responding

Ensure the compute processes are up and running and connected. If a backend object store is configured, ensure it is connected and responsive. If the issue is not resolved, contact the Customer Success Team.

DEBUG

HighDrivesCapacity

SSD capacity overflow

Free up space on the SSDs or add more SSDs to the cluster. To add SSDs, see the Exapnd specific resources of a container topic in the documentation.

MAJOR

HighLevelOfUnreclaimedCapacityInObjectStore

High level of unreclaimed space in an object store

DEBUG

HighSSDToRAMRatio

High SSD to RAM Ratio

Consider increasing RAM cluster-wide or removing unneeded drives to ensure Filesystems(RAID) requirements and lower SSD to RAM ratio

DEBUG

HotspotInodes

Some files have a long waiting queue for IOs

Contact the Customer Success Team to help with resolution.

DEBUG

IBNotEnhanced

Enhanced IB mode disabled

Contact Customer Success to correct this issue

DEBUG

ImbalancedCpuUsage

Imbalanced CPU usage detected in cluster processes

Check system configuration: Examine the system configuration for abnormalities that may be causing the CPU usage imbalance.

DEBUG

JumboConnectivity

A container cannot send jumbo frames

Check the container network settings and the switch to which the container is connected, and ensure to enable jumbo frames. This setting improves performance.

WARNING

KMSError

KMS Error

Review the KMS configuration and connectivity.

MAJOR

LeaderPreparedForUpgrade

Leader prepared for upgrade

After the upgrade, the leader state automatically returns to normal. If this alert persists, contact the Customer Success Team.

DEBUG

LegacyManualOverridesActive

Legacy manual overrides are active

Contact the Customer Success Team.

DEBUG

LicenseError

License error

Ensure the cluster uses the correct license, the license has not expired, and the allocated space does not exceed the license limits.

WARNING

LocalTLSCertificateExpired

Local TLS certificate expired

Update the local certificate.

DEBUG

LocalTLSCertificateExpiringSoon

Local TLS certificate is expiring soon

Update the local certificate.

DEBUG

LocalTLSConnectivityToNeighbors

Outgoing TLS connectivity to backends is down

Fix the TLS issue. One possibility is errors in the local CA certificate in /etc/wekaio/certs

DEBUG

LowDiskSpace

Low disk space

See the event details in the System Events.

MINOR

ManualOverridesActive

Manual overrides are active

Contact the Customer Success Team.

DEBUG

ManualOverridesForced

Manual overrides are forced

Contact the Customer Success Team.

DEBUG

MismatchedDriveFailureDomain

A drive failure domain does not match the failure domain of its attached container

Do one of the following: a) Connect the mismatched drive to a container with a matching failure domain. b) Re-provision the drive to erase its failure domain.

MAJOR

MismatchedJoinSecrets

Backend containers do not have the same join secrets

This may create problems rejoining or reforming the cluster. Make sure all backend containers have the same join secret.

DEBUG

NegativeUnprovisionedCapacity

Negative unprovisioned capacity

Resize one or more of the filesystems to reclaim capacity. For more information, contact the Customer Success Team.

DEBUG

NetworkFailedToStartPorts

Network ports failed to start

Run weka debug net ports $NODE to see the current status.

DEBUG

NetworkInterfaceLinkDown

Network interface link status down

Check the connectivity to the specified network interface. Verify that nothing blocks it.

MINOR

NfsLocksDisabled

NFS Locks disabled

Configure config fs using weka nfs global-config set --config-fs=.

INFO

NfsServiceDownAlert

NFS Service Down

If down services persist, contact the Customer Success Team.

MAJOR

NoCgroupsConfigured

No cgroups configured warnings

Disabled or improperly configured Cgroups can cause system instability and performance degradation. Enable and configure Cgroups (v1/v2) following the Cgroups configuration section at https://docs.weka.io.

WARNING

NoClusterLicense

No license assigned

Obtain and install a license from get.weka.io.

WARNING

NodeBlacklisted

A process cannot rejoin the cluster

To enable the process to rejoin the cluster, whitelist it by running the command ‘weka debug blacklist disable’.

DEBUG

NodeDisconnected

Process disconnected

Check network connectivity to ensure the processes can communicate with the cluster.

MINOR

NodeNetworkUnstable

A process with an unstable network detected

Ensure proper network connectivity in the cluster. If the problem is not resolved, contact the Customer Success Team.

WARNING

NodeRDMANotActive

RDMA support for process is Inactive

Ensure that at least one RDMA-capable device exists.

DEBUG

NodeTieringConnectivity

A process cannot Connect to an object store

Check the connectivity with the object store and ensure the process communicates with it.

MAJOR

NonTlsApisAllowed

Non-TLS APIs are allowed

Update TLS strictness to enforce encrypted TLS APIs over HTTP.

DEBUG

NotEnoughActiveDrives

Reduced data protection

Check the connectivity and server status. Replace failed drives and expand the cluster with new failure domains.

CRITICAL

NotEnoughMemoryForFilesystemOperation

Insufficient cluster-wide RAM for proper Filesystem's Operation

Increase RAM cluster-wide to meet Filesystems (RAID) requirements for RAM or remove drives contributing to SSD capacity.

DEBUG

NotEnoughSSDCapacity

Some provisioned capacity is unavailable due to failed drives

Check for down drives.

MAJOR

PartialConnectivityTrackingDisabled

Partial connectivity tracking is disabled

To turn on the Grim Reaper, contact the Customer Success Team.

DEBUG

PartialHugepageAllocation

Not enough memory

Check your system.

MAJOR

PartiallyConnectedNode

A partially connected process detected

Ensure proper network connectivity in the cluster. If the problem is not resolved, contact the Customer Success Team.

MINOR

PassedClientsAvailabilityThreshold

Reached connected clients limit

Add more backend containers or servers to the cluster, check whether the backends are down, or disconnect some clients.

DEBUG

PathsDegraded

Degraded Paths

Contact the Customer Success Team to review path connectivity.

MINOR

PerformanceDegradedLowRAM

Server low RAM

Ensure all the compute processes are up. Add more servers to the cluster or add RAM to the backend servers.

MAJOR

QuotasHardLimitReached

Directory quota hard limit exceeded

Run 'weka fs quota list' to get the list of directories exceeding their hard quota limits. Clear some space for these directories or increase their hard quota limit.

WARNING

QuotasSoftLimitReached

Directory quota soft limit exceeded

Run 'weka fs quota list' to get the list of directories exceeding their soft quota limits. Clear some space for these directories or increase their soft quota limit.

INFO

RAIDCapacityExhaustion

RAID capacity exhaustion

If the situation is not resolved within minutes, contact the Customer Success Team.

MAJOR

RequestedActionFailure

Requested action failure

Check the logs for more information.

DEBUG

ResourcesNotApplied

Resource changes are not applied

Apply the resource changes by running the command 'weka cluster container apply '.

DEBUG

SSDCapacityDiscrepancy

Used SSD capacity mismatches the expected range

Monitor the compute processes' stability and contact the Customer Success Team.

DEBUG

SSDCapacityTooHigh

Available capacity cannot be fully utilized

For improved SSD capacity usage, contact the Customer Success Team for assistance.

INFO

SystemDefinedTLS

TLS certificate is not user-defined

Replace the auto-generated self-signed certificate with a user-defined certificate by running the command 'weka security tls set'.

INFO

TLSCertificateExpired

TLS certificate expired

Replace the existing certificate by running the command 'weka security tls set'.

MAJOR

TLSCertificateExpiresSoon

TLS certificate is about to expire

Replace the existing certificate by running the command 'weka security tls set'.

MAJOR

TelemetryStatusFault

Telemetry status is not streaming

Check your telemetry sinks configuration.

DEBUG

TieredFilesystemOverfillingSSD

Tiered filesystems' SSD capacity overfilling

To address this issue, consider expanding the filesystem size or removing data and directories. Identify and resolve connectivity problems with the configured Object Store and increase the upload bandwidth if required.

WARNING

TooManyPendingClusterwideJobs

Too many pending cluster-wide jobs

Consider changing the policy configuration.

DEBUG

TraceDumperDown

Trace dumper is down

Contact the Customer Success Team to restart the trace dumper.'

DEBUG

TracesDisabled

Traces are disabled

To turn the cluster traces, run the command 'weka debug traces start'. For more information, see the Traces management topic in the documentation.

DEBUG

TracesFreezePeriodActive

Freeze traces is active

If the problem persists after the case is resolved, contact the Customer Success Team.

DEBUG

UdpModePerformanceWarning

A backend container is configured in UDP mode

If this is a misconfiguration, add network devices to the specified backend container using the command ‘weka cluster container net add’.

DEBUG

UnstableHosts

Host unstable during upgrade

Check the host status and logs for more information.

DEBUG

UnwritableDisksConfigured

A drive is set to unwritable

If the drive remains unwritable after maintenance, contact the Customer Success Team.

DEBUG