Diagnostics data management

Manage diagnostics data for clusters, servers, and containers with CLI commands.

The diagnostics CLI commands enable managing diagnostics data associated with clusters, servers, and containers. The Customer Success Team then analyzes this diagnostics data to assist in troubleshooting. There are two options available for managing diagnostics:

  • Cluster-wide diagnostics commands: Use the command weka diags for cluster-wide diagnostics management from any server within the cluster.

  • Local container diagnostics command: Use the command weka local diags for diagnostics management of a connected local server.

Cluster-wide diagnostics commands

Use the cluster-wide diagnostics commands to oversee diagnostics data on any cluster server. This includes functionalities to upload diagnostics data to WEKA Home, collect diagnostics data, clean up diagnostics files, and list available diagnostics files.

Upload diagnostics data to WEKA Home

Command: weka diags upload

Use the following command to collect diagnostics information, save it, and upload it to WEKA Home (the WEKA support cloud):

weka diags upload [--core-limit core-limit] [--dump-id dump-id] [--container-id container-id]... [--clients] [--backends]

The command response provides an access identifier, Diags collection ID. Send this access identifier to the Customer Success Team to retrieve the diagnostics data from the WEKA Home.

When running the command for all servers in the cluster, a local diagnostics file (dump) is created in each server in the location /opt/weka/diags/local. The local diagnostics file of each server is consolidated in a single diagnostics file in the server where you run the command in the /opt/weka/diags directory.

  • HTTPS access is required to upload the diagnostics to AWS S3 endpoints.

  • The upload process is asynchronous. Therefore, connectivity failure events are reflected in the events log even if the command exits successfully.

Example: collect and upload diagnostics from all backend containers
[root@wekaprod-0 ~] 2023-02-20 13:39:25 $ weka diags upload
Uploading diags from 5 hosts to the cloud
Cluster GUID: c0aca0f2-0d20-465e-9817-e747be811016
Diags collection ID: 1Ox5OYogdTP54Nah7o1cb
Cloud URL: https://api.home.weka.io

Collecting diags                             [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 5 / 5
Copying files for uploading                  [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 5 / 5
Uploading to cloud (this could take a while) [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 125 / 125


+------------------------+
| Diags upload completed |
+------------------------+

Parameters

Name
Description
Default

core-limit

Limit the diagnostics collection process to use the specified core number.

1

dump-id

Uploads a pre-existing diagnostics file (dump) generated by the weka diags collect command with the specified dump ID. Using this option, the command exclusively uploads data and does not conduct data collection. Do not use this option for data previously generated by a weka diags upload command. The command evaluates the list of containers for dump upload, which may differ from those collected in the specified dump directory. Any container data not found in the collected dump is disregarded.

If no ID is provided, a new diagnostics file is generated.

container-id

A list of container ID numbers separated by commas for collecting and uploading diagnostics data. If specified, the --backends and --clients options are ignored.

clients

Collect and upload diagnostics data only from client containers.

No data is collected for clients

backends

Collect and upload diagnostics data only from backend containers (same as if you are not specifying this option). To collect diagnostics for all client and backend containers, add both options --backends and --clients to the command.

Backends only

Collect diagnostics data

Command: weka diags collect

Use the following command to create diagnostics information and save it without uploading it to WEKA Home. This command is useful when there is no connection to WEKA Home, and you want to share the diagnostics file using other options.

weka diags collect [--id id] [--output-dir output-dir] [--core-limit core-limit] [--container-id container-id] [--clients] [--backends] [--tar]

If the command runs with the local keyword, information is collected only from the server on which the command is executed. Otherwise, information is collected from the whole cluster.

Example: collect diagnostics from all backends
[root@wekaprod-0 ~] 2023-02-20 13:38:58 $ weka diags collect
Downloading cluster diagnostics from 5 hosts to this host
Diags will be saved to: /opt/weka/diags/ody2uRl8xOfDESd6vkbYH4

Collecting diags      [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 5 / 5
Downloading artifacts [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 46.43 MiB / 46.43 MiB

CATEGORY   DIAG                       H0  H1  H2  H3  H4
host       uptime                     ●   ●   ●   ●   ●
host       date                       ●   ●   ●   ●   ●
host       uname                      ●   ●   ●   ●   ●
host       top                        ●   ●   ●   ●   ●
host       free_hugepages             ●   ●   ●   ●   ●
host       nr_hugepages               ●   ●   ●   ●   ●
host       nr_hugepages_mempolicy     ●   ●   ●   ●   ●
host       nr_overcommit_hugepages    ●   ●   ●   ●   ●
host       surplus_hugepages          ●   ●   ●   ●   ●
host       meminfo                    ●   ●   ●   ●   ●
host       cpuinfo                    ●   ●   ●   ●   ●
host       netstat                    ●   ●   ●   ●   ●
host       etc-hosts                  ●   ●   ●   ●   ●
host       ps                         ●   ●   ●   ●   ●
host       mount                      ●   ●   ●   ●   ●
host       rpcinfo                    ●   ●   ●   ●   ●
host       kernel_modules             ●   ●   ●   ●   ●
host       pci_devices                ●   ●   ●   ●   ●
host       pci_tree                   ●   ●   ●   ●   ●
host       ip_links                   ●   ●   ●   ●   ●
host       ip_routes                  ●   ●   ●   ●   ●
host       arp_table                  ●   ●   ●   ●   ●
host       iptables                   ●   ●   ●   ●   ●
host       resolv.conf                ●   ●   ●   ●   ●
host       root_disk_usage            ●   ●   ●   ●   ●
host       dmesg                      ●   ●   ●   ●   ●
host       dmidecode                  ●   ●   ●   ●   ●
host       network_manager            X   X   X   X   X
host       selinux                    ●   ●   ●   ●   ●
host       fstab                      ●   ●   ●   ●   ●
host       journalctl                 ●   ●   ●   ●   ●
host       systemctl                  ●   ●   ●   ●   ●
host       ip4addr                    ●   ●   ●   ●   ●
host       ip4link                    ●   ●   ●   ●   ●
host       syslog                     ●   ●   ●   ●   ●
host       boot_log                   -   -   -   -   -
host       ofed-version               -   -   -   -   -
host       ofed                       -   -   -   -   -
host       mlnx4_core                 -   -   -   -   -
host       mlnx5_core                 -   -   -   -   -
host       memtest                    ●   ●   ●   ●   ●
host       is-numa-balancing-active   ●   ●   ●   ●   ●
host       is-swap-on                 ●   ●   ●   ●   ●
host       lsblk                      ●   ●   ●   ●   ●
host       system-release             ●   ●   ●   ●   ●
host       os-release                 ●   ●   ●   ●   ●
host       lsb-release                -   -   -   -   -
host       redhat-release             ●   ●   ●   ●   ●
host       lscpu                      ●   ●   ●   ●   ●
host       ifconfig                   ●   ●   ●   ●   ●
host       ip_rule                    ●   ●   ●   ●   ●
host       weka_local_status          ●   ●   ●   ●   ●
weka_host  container_list             ●   ●   ●   ●   ●
weka_host  lstopo                     ●   ●   ●   ●   ●
weka_host  numactl                    ●   ●   ●   ●   ●
weka_host  ipmi-sel                   ●   ●   ●   ●   ●
weka_host  ipmi-sdr                   ●   ●   ●   ●   ●
weka_host  logs                       ●   ●   ●   ●   ●
weka_host  driver_queue               ●   ●   ●   ●   ●
weka_host  local_events               ●   ●   ●   ●   ●
weka_host  traces_analysis            ●   ●   ●   ●   ●
weka_host  resources_files            ●   ●   ●   ●   ●
weka_host  core_dumps                 ●   ●   ●   ●   ●
cluster    api-status                 ●
cluster    api-alerts                 ●
cluster    api-realtime_stats         ●
cluster    api-hosts                  ●
cluster    api-nodes                  ●
cluster    api-drives                 ●
cluster    api-failure_domains        ●
cluster    api-host_hardware          ●
cluster    api-filesystems            ●
cluster    api-filesystem_groups      ●
cluster    api-snapshots              ●
cluster    api-tiering                ●
cluster    api-users                  ●
cluster    api-default_data_net       ●
cluster    api-nfs_client_groups      ●
cluster    api-nfs_interface_groups   ●
cluster    api-nfs_permissions        ●
cluster    api-nfs_status             ●
cluster    cli-status                 ●
cluster    cli-filesystems            ●
cluster    cli-host                   ●
cluster    cli-net                    ●
cluster    cli-nodes                  ●
cluster    cli-drives                 ●
cluster    cli-buckets                ●
cluster    cli-cluster-tasks          ●
cluster    cli-alerts                 ●
cluster    cli-rebuild_status         ●
cluster    cli-filesystem_groups      ●
cluster    cli-kms                    ●
cluster    cli-fs_tier_s3             ●
cluster    cli-org                    ●
cluster    cli-realtime_stats         ●
cluster    cli-smb                    ●
cluster    cli-smb-cluster-status     ●
cluster    cli-smb-domain             ●
cluster    cli-smb-share              ●
cluster    cli-smb-share-lists-show   ●
cluster    cli-cloud                  ●
cluster    cli-buckets-dist           ●
cluster    cli-net-links              ●
cluster    cli-blacklist-list         ●
cluster    cli-manual-overrides-list  ●
cluster    cli-traces-status          ●
cluster    cli-traces-freeze          ●
cluster    cli-s3-cluster             ●
cluster    cli-s3-cluster-status      ●
cluster    cli-s3-bucket-list         -
cluster    cli-config-list-overrides  ●
cluster    api-cfgdump                ●
report     summary                    ●   ●   ●   ●   ●
report     errors                     ●   ●   ●   ●   ●

The following errors were found:

   h0:
      network_manager:
         - The NetworkManager service is running on this host, it must be turned off for weka to run properly

   h1:
      network_manager:
         - The NetworkManager service is running on this host, it must be turned off for weka to run properly

   h2:
      network_manager:
         - The NetworkManager service is running on this host, it must be turned off for weka to run properly

   h3:
      network_manager:
         - The NetworkManager service is running on this host, it must be turned off for weka to run properly

   h4:
      network_manager:
         - The NetworkManager service is running on this host, it must be turned off for weka to run properly

Parameters

Name
Description
Default

id

An optional identifier for this diagnostics file. If not specified, a random ID is generated.

Auto-generated

output-dir

The directory for saving the diagnostics file.

/opt/weka/diags

core-limit

Limit the diagnostics collection process to use the specified core number.

1

container-id

A list of container ID numbers separated by commas for collecting diagnostics data. If specified, the --backends and --clients options are ignored.

clients

Collect diagnostics data only from client containers.

No data is collected for clients

backends

Collect diagnostics data only from backend containers (same as if you are not specifying this option). To collect diagnostics for all client and backend containers, add both options --backends and --clients to the command.

Backends only

tar

Package the collected diagnostics in a TAR file.

No TAR file is created

Clean up the diagnostics files

The weka diags collect command consolidates diagnostics from various containers into a single dump directory. Conversely, weka diags upload saves diagnostics data on each container, distributing files across the cluster before uploading to WEKA Home.

Collecting diagnostics data generates individual files, consuming disk space. As a result, the system may accumulate numerous diagnostics files, especially after they have been uploaded to WEKA Home. To optimize disk space usage, perform a cleanup either on a specific diagnostics file or an entire directory containing multiple diagnostics files.

Cleanup procedure:

  1. List diagnostics files: List the diagnostics files in the system, including their corresponding IDs. This step provides an overview of the available diagnostic files.

  2. Delete specific diagnostic files: Delete specific diagnostics files based on their IDs. This targeted cleanup helps efficiently manage disk space and ensures the removal of unnecessary diagnostic data.

The diagnostics files are essential for troubleshooting purposes. Delete these files only if you are certain they have been successfully uploaded to WEKA Home and are no longer needed. For further clarification, contact the Customer Success Team.

List diagnostics files

Command: weka diags list

Use the following command to list the collected diagnostics files:

weka diags list [--verbose] [<id>]...

Parameters

Name
Description

id

The diagnostics file's ID or the path to the diagnostics file. If not specified, a list of all collected diagnostics files is displayed.

verbose

Displays the results of all the diagnostics files, including the successful ones.

Delete specific diagnostic files

Command: weka diags rm

Use the following command to stop a running diagnostics instance, cancel its upload, and delete it from the disk:

weka diags rm [--all] [<id>]...

Parameters

Name
Description

all

A flag to delete all the diagnostics files.

id*

The diagnostics file's ID or the path to the diagnostics files. If not specified, a list of all collected diagnostics files is displayed. This string is required unless the all option is specified.

Local server diagnostics command

Collecting diagnostics data from a connected local server is valuable in various scenarios, such as:

  • Lack of a functional management process in the originating backend container or the specified backend containers.

  • Absence of connectivity between the management process and the cluster leader.

  • The cluster lacking a leader.

  • The local container is offline.

  • The server cannot establish communication with the leader or encountering a failure when attempting the weka diags command.

Command: weka local diags

Use the following command to to collect diagnostics from a connected local server:

weka local diags [--id id] [--output-dir output-dir] [--core-dump-limit core-dump-limit] [--collect-cluster-info] [--tar]

Parameters

Name
Description
Default

id

A unique identifier for this diagnostics file.

Auto-generated

output-dir

The directory for saving the diagnostics file.

/opt/weka/diags

core-dump-limit

Limit the diagnostics collection process to use the specified core number.

1

collect-cluster-info

Collect diagnostics data related to the cluster. To prevent excessive load on the cluster, use this flag for one server at a time.

tar

Package the collected diagnostics data in a TAR file.

No TAR file is created

Last updated