W E K A
4.4
4.4
  • WEKA v4.4 documentation
    • Documentation revision history
  • WEKA System Overview
    • Introduction
      • WEKA system functionality features
      • Converged WEKA system deployment
      • Optimize redundancy in WEKA deployments
    • SSD capacity management
    • Filesystems, object stores, and filesystem groups
    • WEKA networking
    • Data lifecycle management
    • WEKA client and mount modes
    • WEKA containers architecture overview
    • Glossary
  • Planning and Installation
    • Prerequisites and compatibility
    • WEKA cluster installation on bare metal servers
      • Plan the WEKA system hardware requirements
      • Obtain the WEKA installation packages
      • Install the WEKA cluster using the WMS with WSA
      • Install the WEKA cluster using the WSA
      • Manually install OS and WEKA on servers
      • Manually prepare the system for WEKA configuration
        • Broadcom adapter setup for WEKA system
        • Enable the SR-IOV
      • Configure the WEKA cluster using the WEKA Configurator
      • Manually configure the WEKA cluster using the resources generator
        • VLAN tagging in the WEKA system
      • Perform post-configuration procedures
      • Add clients to an on-premises WEKA cluster
    • WEKA Cloud Deployment Manager Web (CDM Web) User Guide
    • WEKA Cloud Deployment Manager Local (CDM Local) User Guide
    • WEKA installation on AWS
      • WEKA installation on AWS using Terraform
        • Terraform-AWS-WEKA module description
        • Deployment on AWS using Terraform
        • Required services and supported regions
        • Supported EC2 instance types using Terraform
        • WEKA cluster auto-scaling in AWS
        • Detailed deployment tutorial: WEKA on AWS using Terraform
      • WEKA installation on AWS using the Cloud Formation
        • Self-service portal
        • CloudFormation template generator
        • Deployment types
        • AWS Outposts deployment
        • Supported EC2 instance types using Cloud Formation
        • Add clients to a WEKA cluster on AWS
        • Auto scaling group
        • Troubleshooting
    • WEKA installation on Azure
      • Azure-WEKA deployment Terraform package description
      • Deployment on Azure using Terraform
      • Required services and supported regions
      • Supported virtual machine types
      • Auto-scale virtual machines in Azure
      • Add clients to a WEKA cluster on Azure
      • Troubleshooting
      • Detailed deployment tutorial: WEKA on Azure using Terraform
    • WEKA installation on GCP
      • WEKA project description
      • GCP-WEKA deployment Terraform package description
      • Deployment on GCP using Terraform
      • Required services and supported regions
      • Supported machine types and storage
      • Auto-scale instances in GCP
      • Add clients to a WEKA cluster on GCP
      • Troubleshooting
      • Detailed deployment tutorial: WEKA on GCP using Terraform
      • Google Kubernetes Engine and WEKA over POSIX deployment
    • WEKA installation on OCI
  • Getting Started with WEKA
    • Manage the system using the WEKA GUI
    • Manage the system using the WEKA CLI
      • WEKA CLI hierarchy
      • CLI reference guide
    • Run first IOs with WEKA filesystem
    • Getting started with WEKA REST API
    • WEKA REST API and equivalent CLI commands
  • Performance
    • WEKA performance tests
      • Test environment details
  • WEKA Filesystems & Object Stores
    • Manage object stores
      • Manage object stores using the GUI
      • Manage object stores using the CLI
    • Manage filesystem groups
      • Manage filesystem groups using the GUI
      • Manage filesystem groups using the CLI
    • Manage filesystems
      • Manage filesystems using the GUI
      • Manage filesystems using the CLI
    • Attach or detach object store buckets
      • Attach or detach object store bucket using the GUI
      • Attach or detach object store buckets using the CLI
    • Advanced data lifecycle management
      • Advanced time-based policies for data storage location
      • Data management in tiered filesystems
      • Transition between tiered and SSD-only filesystems
      • Manual fetch and release of data
    • Mount filesystems
      • Mount filesystems from Single Client to Multiple Clusters (SCMC)
      • Manage authentication across multiple clusters with connection profiles
    • Snapshots
      • Manage snapshots using the GUI
      • Manage snapshots using the CLI
    • Snap-To-Object
      • Manage Snap-To-Object using the GUI
      • Manage Snap-To-Object using the CLI
    • Snapshot policies
      • Manage snapshot policies using the GUI
      • Manage snapshot policies using the CLI
    • Quota management
      • Manage quotas using the GUI
      • Manage quotas using the CLI
  • Additional Protocols
    • Additional protocol containers
    • Manage the NFS protocol
      • Supported NFS client mount parameters
      • Manage NFS networking using the GUI
      • Manage NFS networking using the CLI
    • Manage the S3 protocol
      • S3 cluster management
        • Manage the S3 service using the GUI
        • Manage the S3 service using the CLI
      • S3 buckets management
        • Manage S3 buckets using the GUI
        • Manage S3 buckets using the CLI
      • S3 users and authentication
        • Manage S3 users and authentication using the CLI
        • Manage S3 service accounts using the CLI
      • S3 lifecycle rules management
        • Manage S3 lifecycle rules using the GUI
        • Manage S3 lifecycle rules using the CLI
      • Audit S3 APIs
        • Configure audit webhook using the GUI
        • Configure audit webhook using the CLI
        • Example: How to use Splunk to audit S3
        • Example: How to use S3 audit events for tracking and security
      • S3 supported APIs and limitations
      • S3 examples using boto3
      • Configure and use AWS CLI with WEKA S3 storage
    • Manage the SMB protocol
      • Manage SMB using the GUI
      • Manage SMB using the CLI
  • Security
    • WEKA security overview
    • Obtain authentication tokens
    • Manage token expiration
    • Manage account lockout threshold policy
    • Manage KMS
      • Manage KMS using GUI
      • Manage KMS using CLI
    • Manage TLS certificates
      • Manage TLS certificates using GUI
      • Manage TLS certificates using CLI
    • Manage Cross-Origin Resource Sharing
    • Manage CIDR-based security policies
    • Manage login banner
  • Secure cluster membership with join secret authentication
  • Licensing
    • License overview
    • Classic license
  • Operation Guide
    • Alerts
      • Manage alerts using the GUI
      • Manage alerts using the CLI
      • List of alerts and corrective actions
    • Events
      • Manage events using the GUI
      • Manage events using the CLI
      • List of events
    • Statistics
      • Manage statistics using the GUI
      • Manage statistics using the CLI
      • List of statistics
    • Insights
    • System congestion
    • User management
      • Manage users using the GUI
      • Manage users using the CLI
    • Organizations management
      • Manage organizations using the GUI
      • Manage organizations using the CLI
      • Mount authentication for organization filesystems
    • Expand and shrink cluster resources
      • Add a backend server
      • Expand specific resources of a container
      • Shrink a cluster
    • Background tasks
      • Set up a Data Services container for background tasks
      • Manage background tasks using the GUI
      • Manage background tasks using the CLI
    • Upgrade WEKA versions
    • Manage WEKA drivers
  • Monitor the WEKA Cluster
    • Deploy monitoring tools using the WEKA Management Station (WMS)
    • WEKA Home - The WEKA support cloud
      • Local WEKA Home overview
      • Deploy Local WEKA Home v3.0 or higher
      • Deploy Local WEKA Home v2.x
      • Explore cluster insights
      • Explore performance statistics in Grafana
      • Manage alerts and integrations
      • Enforce security and compliance
      • Optimize support and data management
      • Export cluster metrics to Prometheus
    • Set up WEKAmon for external monitoring
    • Set up the SnapTool external snapshots manager
  • Kubernetes
    • Composable clusters for multi-tenancy in Kubernetes
    • WEKA Operator deployment
    • WEKA Operator day-2 operations
  • WEKApod
    • WEKApod Data Platform Appliance overview
    • WEKApod servers overview
    • Rack installation
    • WEKApod initial system setup and configuration
    • WEKApod support process
  • AWS Solutions
    • Amazon SageMaker HyperPod and WEKA Integrations
      • Deploy a new Amazon SageMaker HyperPod cluster with WEKA
      • Add WEKA to an existing Amazon SageMaker HyperPod cluster
    • AWS ParallelCluster and WEKA Integration
  • Azure Solutions
    • Azure CycleCloud for SLURM and WEKA Integration
  • Best Practice Guides
    • WEKA and Slurm integration
      • Avoid conflicting CPU allocations
    • Storage expansion best practice
  • Support
    • Get support for your WEKA system
    • Diagnostics management
      • Traces management
        • Manage traces using the GUI
        • Manage traces using the CLI
      • Protocols debug level management
        • Manage protocols debug level using the GUI
        • Manage protocols debug level using the CLI
      • Diagnostics data management
  • Appendices
    • WEKA CSI Plugin
      • Deployment
      • Storage class configurations
      • Tailor your storage class configuration with mount options
      • Dynamic and static provisioning
      • Launch an application using WEKA as the POD's storage
      • Add SELinux support
      • NFS transport failback
      • Upgrade legacy persistent volumes for capacity enforcement
      • Troubleshooting
    • Convert cluster to multi-container backend
    • Create a client image
    • Update WMS and WSA
    • BIOS tool
Powered by GitBook
On this page
  • Cluster-wide diagnostics commands
  • Upload diagnostics data to WEKA Home
  • Collect diagnostics data
  • Clean up the diagnostics files
  • List diagnostics files
  • Delete specific diagnostic files
  • Local server diagnostics command
  1. Support
  2. Diagnostics management

Diagnostics data management

Manage diagnostics data for clusters, servers, and containers with CLI commands.

The diagnostics CLI commands enable managing diagnostics data associated with clusters, servers, and containers. The Customer Success Team then analyzes this diagnostics data to assist in troubleshooting. There are two options available for managing diagnostics:

  • Cluster-wide diagnostics commands: Use the command weka diags for cluster-wide diagnostics management from any server within the cluster.

  • Local container diagnostics command: Use the command weka local diags for diagnostics management of a connected local server.

Cluster-wide diagnostics commands

Use the cluster-wide diagnostics commands to oversee diagnostics data on any cluster server. This includes functionalities to upload diagnostics data to WEKA Home, collect diagnostics data, clean up diagnostics files, and list available diagnostics files.

Upload diagnostics data to WEKA Home

Command: weka diags upload

Use the following command to collect diagnostics information, save it, and upload it to WEKA Home (the WEKA support cloud):

weka diags upload [--core-limit core-limit] [--dump-id dump-id] [--container-id container-id]... [--clients] [--backends]

The command response provides an access identifier, Diags collection ID. Send this access identifier to the Customer Success Team to retrieve the diagnostics data from the WEKA Home.

When running the command for all servers in the cluster, a local diagnostics file (dump) is created in each server in the location /opt/weka/diags/local. The local diagnostics file of each server is consolidated in a single diagnostics file in the server where you run the command in the /opt/weka/diags directory.

  • HTTPS access is required to upload the diagnostics to AWS S3 endpoints.

  • The upload process is asynchronous. Therefore, connectivity failure events are reflected in the events log even if the command exits successfully.

Example: collect and upload diagnostics from all backend containers
[root@wekaprod-0 ~] 2023-02-20 13:39:25 $ weka diags upload
Uploading diags from 5 hosts to the cloud
Cluster GUID: c0aca0f2-0d20-465e-9817-e747be811016
Diags collection ID: 1Ox5OYogdTP54Nah7o1cb
Cloud URL: https://api.home.weka.io

Collecting diags                             [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 5 / 5
Copying files for uploading                  [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 5 / 5
Uploading to cloud (this could take a while) [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 125 / 125


+------------------------+
| Diags upload completed |
+------------------------+

Parameters

Name
Description
Default

core-limit

Limit the diagnostics collection process to use the specified core number.

1

dump-id

Uploads a pre-existing diagnostics file (dump) generated by the weka diags collect command with the specified dump ID. Using this option, the command exclusively uploads data and does not conduct data collection. Do not use this option for data previously generated by a weka diags upload command. The command evaluates the list of containers for dump upload, which may differ from those collected in the specified dump directory. Any container data not found in the collected dump is disregarded.

If no ID is provided, a new diagnostics file is generated.

container-id

A list of container ID numbers separated by commas for collecting and uploading diagnostics data. If specified, the --backends and --clients options are ignored.

clients

Collect and upload diagnostics data only from client containers.

No data is collected for clients

backends

Collect and upload diagnostics data only from backend containers (same as if you are not specifying this option). To collect diagnostics for all client and backend containers, add both options --backends and --clients to the command.

Backends only

Collect diagnostics data

Command: weka diags collect

Use the following command to create diagnostics information and save it without uploading it to WEKA Home. This command is useful when there is no connection to WEKA Home, and you want to share the diagnostics file using other options.

weka diags collect [--id id] [--output-dir output-dir] [--core-limit core-limit] [--container-id container-id] [--clients] [--backends] [--tar]

If the command runs with the local keyword, information is collected only from the server on which the command is executed. Otherwise, information is collected from the whole cluster.

Example: collect diagnostics from all backends
[root@wekaprod-0 ~] 2023-02-20 13:38:58 $ weka diags collect
Downloading cluster diagnostics from 5 hosts to this host
Diags will be saved to: /opt/weka/diags/ody2uRl8xOfDESd6vkbYH4

Collecting diags      [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 5 / 5
Downloading artifacts [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 46.43 MiB / 46.43 MiB

CATEGORY   DIAG                       H0  H1  H2  H3  H4
host       uptime                     ●   ●   ●   ●   ●
host       date                       ●   ●   ●   ●   ●
host       uname                      ●   ●   ●   ●   ●
host       top                        ●   ●   ●   ●   ●
host       free_hugepages             ●   ●   ●   ●   ●
host       nr_hugepages               ●   ●   ●   ●   ●
host       nr_hugepages_mempolicy     ●   ●   ●   ●   ●
host       nr_overcommit_hugepages    ●   ●   ●   ●   ●
host       surplus_hugepages          ●   ●   ●   ●   ●
host       meminfo                    ●   ●   ●   ●   ●
host       cpuinfo                    ●   ●   ●   ●   ●
host       netstat                    ●   ●   ●   ●   ●
host       etc-hosts                  ●   ●   ●   ●   ●
host       ps                         ●   ●   ●   ●   ●
host       mount                      ●   ●   ●   ●   ●
host       rpcinfo                    ●   ●   ●   ●   ●
host       kernel_modules             ●   ●   ●   ●   ●
host       pci_devices                ●   ●   ●   ●   ●
host       pci_tree                   ●   ●   ●   ●   ●
host       ip_links                   ●   ●   ●   ●   ●
host       ip_routes                  ●   ●   ●   ●   ●
host       arp_table                  ●   ●   ●   ●   ●
host       iptables                   ●   ●   ●   ●   ●
host       resolv.conf                ●   ●   ●   ●   ●
host       root_disk_usage            ●   ●   ●   ●   ●
host       dmesg                      ●   ●   ●   ●   ●
host       dmidecode                  ●   ●   ●   ●   ●
host       network_manager            X   X   X   X   X
host       selinux                    ●   ●   ●   ●   ●
host       fstab                      ●   ●   ●   ●   ●
host       journalctl                 ●   ●   ●   ●   ●
host       systemctl                  ●   ●   ●   ●   ●
host       ip4addr                    ●   ●   ●   ●   ●
host       ip4link                    ●   ●   ●   ●   ●
host       syslog                     ●   ●   ●   ●   ●
host       boot_log                   -   -   -   -   -
host       ofed-version               -   -   -   -   -
host       ofed                       -   -   -   -   -
host       mlnx4_core                 -   -   -   -   -
host       mlnx5_core                 -   -   -   -   -
host       memtest                    ●   ●   ●   ●   ●
host       is-numa-balancing-active   ●   ●   ●   ●   ●
host       is-swap-on                 ●   ●   ●   ●   ●
host       lsblk                      ●   ●   ●   ●   ●
host       system-release             ●   ●   ●   ●   ●
host       os-release                 ●   ●   ●   ●   ●
host       lsb-release                -   -   -   -   -
host       redhat-release             ●   ●   ●   ●   ●
host       lscpu                      ●   ●   ●   ●   ●
host       ifconfig                   ●   ●   ●   ●   ●
host       ip_rule                    ●   ●   ●   ●   ●
host       weka_local_status          ●   ●   ●   ●   ●
weka_host  container_list             ●   ●   ●   ●   ●
weka_host  lstopo                     ●   ●   ●   ●   ●
weka_host  numactl                    ●   ●   ●   ●   ●
weka_host  ipmi-sel                   ●   ●   ●   ●   ●
weka_host  ipmi-sdr                   ●   ●   ●   ●   ●
weka_host  logs                       ●   ●   ●   ●   ●
weka_host  driver_queue               ●   ●   ●   ●   ●
weka_host  local_events               ●   ●   ●   ●   ●
weka_host  traces_analysis            ●   ●   ●   ●   ●
weka_host  resources_files            ●   ●   ●   ●   ●
weka_host  core_dumps                 ●   ●   ●   ●   ●
cluster    api-status                 ●
cluster    api-alerts                 ●
cluster    api-realtime_stats         ●
cluster    api-hosts                  ●
cluster    api-nodes                  ●
cluster    api-drives                 ●
cluster    api-failure_domains        ●
cluster    api-host_hardware          ●
cluster    api-filesystems            ●
cluster    api-filesystem_groups      ●
cluster    api-snapshots              ●
cluster    api-tiering                ●
cluster    api-users                  ●
cluster    api-default_data_net       ●
cluster    api-nfs_client_groups      ●
cluster    api-nfs_interface_groups   ●
cluster    api-nfs_permissions        ●
cluster    api-nfs_status             ●
cluster    cli-status                 ●
cluster    cli-filesystems            ●
cluster    cli-host                   ●
cluster    cli-net                    ●
cluster    cli-nodes                  ●
cluster    cli-drives                 ●
cluster    cli-buckets                ●
cluster    cli-cluster-tasks          ●
cluster    cli-alerts                 ●
cluster    cli-rebuild_status         ●
cluster    cli-filesystem_groups      ●
cluster    cli-kms                    ●
cluster    cli-fs_tier_s3             ●
cluster    cli-org                    ●
cluster    cli-realtime_stats         ●
cluster    cli-smb                    ●
cluster    cli-smb-cluster-status     ●
cluster    cli-smb-domain             ●
cluster    cli-smb-share              ●
cluster    cli-smb-share-lists-show   ●
cluster    cli-cloud                  ●
cluster    cli-buckets-dist           ●
cluster    cli-net-links              ●
cluster    cli-blacklist-list         ●
cluster    cli-manual-overrides-list  ●
cluster    cli-traces-status          ●
cluster    cli-traces-freeze          ●
cluster    cli-s3-cluster             ●
cluster    cli-s3-cluster-status      ●
cluster    cli-s3-bucket-list         -
cluster    cli-config-list-overrides  ●
cluster    api-cfgdump                ●
report     summary                    ●   ●   ●   ●   ●
report     errors                     ●   ●   ●   ●   ●

The following errors were found:

   h0:
      network_manager:
         - The NetworkManager service is running on this host, it must be turned off for weka to run properly

   h1:
      network_manager:
         - The NetworkManager service is running on this host, it must be turned off for weka to run properly

   h2:
      network_manager:
         - The NetworkManager service is running on this host, it must be turned off for weka to run properly

   h3:
      network_manager:
         - The NetworkManager service is running on this host, it must be turned off for weka to run properly

   h4:
      network_manager:
         - The NetworkManager service is running on this host, it must be turned off for weka to run properly

Parameters

Name
Description
Default

id

An optional identifier for this diagnostics file. If not specified, a random ID is generated.

Auto-generated

output-dir

The directory for saving the diagnostics file.

/opt/weka/diags

core-limit

Limit the diagnostics collection process to use the specified core number.

1

container-id

A list of container ID numbers separated by commas for collecting diagnostics data. If specified, the --backends and --clients options are ignored.

clients

Collect diagnostics data only from client containers.

No data is collected for clients

backends

Collect diagnostics data only from backend containers (same as if you are not specifying this option). To collect diagnostics for all client and backend containers, add both options --backends and --clients to the command.

Backends only

tar

Package the collected diagnostics in a TAR file.

No TAR file is created

Clean up the diagnostics files

The weka diags collect command consolidates diagnostics from various containers into a single dump directory. Conversely, weka diags upload saves diagnostics data on each container, distributing files across the cluster before uploading to WEKA Home.

Collecting diagnostics data generates individual files, consuming disk space. As a result, the system may accumulate numerous diagnostics files, especially after they have been uploaded to WEKA Home. To optimize disk space usage, perform a cleanup either on a specific diagnostics file or an entire directory containing multiple diagnostics files.

Cleanup procedure:

List diagnostics files

Command: weka diags list

Use the following command to list the collected diagnostics files:

weka diags list [--verbose] [<id>]...

Parameters

Name
Description

id

The diagnostics file's ID or the path to the diagnostics file. If not specified, a list of all collected diagnostics files is displayed.

verbose

Displays the results of all the diagnostics files, including the successful ones.

Delete specific diagnostic files

Command: weka diags rm

Use the following command to stop a running diagnostics instance, cancel its upload, and delete it from the disk:

weka diags rm [--all] [<id>]...

Parameters

Name
Description

all

A flag to delete all the diagnostics files.

id*

The diagnostics file's ID or the path to the diagnostics files. If not specified, a list of all collected diagnostics files is displayed. This string is required unless the all option is specified.

Local server diagnostics command

Collecting diagnostics data from a connected local server is valuable in various scenarios, such as:

  • Lack of a functional management process in the originating backend container or the specified backend containers.

  • Absence of connectivity between the management process and the cluster leader.

  • The cluster lacking a leader.

  • The local container is offline.

  • The server cannot establish communication with the leader or encountering a failure when attempting the weka diags command.

Command: weka local diags

Use the following command to to collect diagnostics from a connected local server:

weka local diags [--id id] [--output-dir output-dir] [--core-dump-limit core-dump-limit] [--collect-cluster-info] [--tar]

Parameters

Name
Description
Default

id

A unique identifier for this diagnostics file.

Auto-generated

output-dir

The directory for saving the diagnostics file.

/opt/weka/diags

core-dump-limit

Limit the diagnostics collection process to use the specified core number.

1

collect-cluster-info

Collect diagnostics data related to the cluster. To prevent excessive load on the cluster, use this flag for one server at a time.

tar

Package the collected diagnostics data in a TAR file.

No TAR file is created

PreviousManage protocols debug level using the CLINextWEKA CSI Plugin

Last updated 2 months ago

: List the diagnostics files in the system, including their corresponding IDs. This step provides an overview of the available diagnostic files.

: Delete specific diagnostics files based on their IDs. This targeted cleanup helps efficiently manage disk space and ensures the removal of unnecessary diagnostic data.

The diagnostics files are essential for troubleshooting purposes. Delete these files only if you are certain they have been successfully uploaded to WEKA Home and are no longer needed. For further clarification, contact the .

List diagnostics files
Delete specific diagnostic files
Customer Success Team