Diagnostics data management

Manage diagnostics data for clusters, servers, and containers with CLI commands.

The diagnostics CLI commands enable managing diagnostics data associated with clusters, servers, and containers. The Customer Success Team then analyzes this diagnostics data to assist in troubleshooting. There are two options available for managing diagnostics:

  • Cluster-wide diagnostics commands: Use the command weka diags for cluster-wide diagnostics management from any server within the cluster.

  • Local container diagnostics command: Use the command weka local diags for diagnostics management of a connected local server.

Cluster-wide diagnostics commands

Use the cluster-wide diagnostics commands to oversee diagnostics data on any cluster server. This includes functionalities to upload diagnostics data to WEKA Home, collect diagnostics data, clean up diagnostics files, and list available diagnostics files.

Upload diagnostics data to WEKA Home

Command: weka diags upload

Use the following command to collect diagnostics information, save it, and upload it to WEKA Home (the WEKA support cloud):

weka diags upload [--core-limit core-limit] [--dump-id dump-id] [--container-id container-id]... [--clients] [--backends]

The command response provides an access identifier, Diags collection ID. Send this access identifier to the Customer Success Team to retrieve the diagnostics data from the WEKA Home.

When running the command for all servers in the cluster, a local diagnostics file (dump) is created in each server in the location /opt/weka/diags/local. The local diagnostics file of each server is consolidated in a single diagnostics file in the server where you run the command in the /opt/weka/diags directory.

  • HTTPS access is required to upload the diagnostics to AWS S3 endpoints.

  • The upload process is asynchronous. Therefore, connectivity failure events are reflected in the events log even if the command exits successfully.

Example: collect and upload diagnostics from all backend containers
[root@wekaprod-0 ~] 2023-02-20 13:39:25 $ weka diags upload
Uploading diags from 5 hosts to the cloud
Cluster GUID: c0aca0f2-0d20-465e-9817-e747be811016
Diags collection ID: 1Ox5OYogdTP54Nah7o1cb
Cloud URL: https://api.home.weka.io

Collecting diags                             [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 5 / 5
Copying files for uploading                  [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 5 / 5
Uploading to cloud (this could take a while) [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 125 / 125


+------------------------+
| Diags upload completed |
+------------------------+

Parameters

NameDescriptionDefault

core-limit

Limit the diagnostics collection process to use the specified core number.

1

dump-id

Uploads a pre-existing diagnostics file (dump) generated by the weka diags collect command with the specified dump ID. Using this option, the command exclusively uploads data and does not conduct data collection. Do not use this option for data previously generated by a weka diags upload command. The command evaluates the list of containers for dump upload, which may differ from those collected in the specified dump directory. Any container data not found in the collected dump is disregarded.

If no ID is provided, a new diagnostics file is generated.

container-id

A list of container ID numbers separated by commas for collecting and uploading diagnostics data. If specified, the --backends and --clients options are ignored.

clients

Collect and upload diagnostics data only from client containers.

No data is collected for clients

backends

Collect and upload diagnostics data only from backend containers (same as if you are not specifying this option). To collect diagnostics for all client and backend containers, add both options --backends and --clients to the command.

Backends only

Collect diagnostics data

Command: weka diags collect

Use the following command to create diagnostics information and save it without uploading it to WEKA Home. This command is useful when there is no connection to WEKA Home, and you want to share the diagnostics file using other options.

weka diags collect [--id id] [--output-dir output-dir] [--core-limit core-limit] [--container-id container-id] [--clients] [--backends] [--tar]

If the command runs with the local keyword, information is collected only from the server on which the command is executed. Otherwise, information is collected from the whole cluster.

Example: collect diagnostics from all backends
[root@wekaprod-0 ~] 2023-02-20 13:38:58 $ weka diags collect
Downloading cluster diagnostics from 5 hosts to this host
Diags will be saved to: /opt/weka/diags/ody2uRl8xOfDESd6vkbYH4

Collecting diags      [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 5 / 5
Downloading artifacts [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 46.43 MiB / 46.43 MiB

CATEGORY   DIAG                       H0  H1  H2  H3  H4
host       uptime                                 
host       date                                   
host       uname                                  
host       top                                    
host       free_hugepages                         
host       nr_hugepages                           
host       nr_hugepages_mempolicy                 
host       nr_overcommit_hugepages                
host       surplus_hugepages                      
host       meminfo                                
host       cpuinfo                                
host       netstat                                
host       etc-hosts                              
host       ps                                     
host       mount                                  
host       rpcinfo                                
host       kernel_modules                         
host       pci_devices                            
host       pci_tree                               
host       ip_links                               
host       ip_routes                              
host       arp_table                              
host       iptables                               
host       resolv.conf                            
host       root_disk_usage                        
host       dmesg                                  
host       dmidecode                              
host       network_manager            X   X   X   X   X
host       selinux                                
host       fstab                                  
host       journalctl                             
host       systemctl                              
host       ip4addr                                
host       ip4link                                
host       syslog                                 
host       boot_log                   -   -   -   -   -
host       ofed-version               -   -   -   -   -
host       ofed                       -   -   -   -   -
host       mlnx4_core                 -   -   -   -   -
host       mlnx5_core                 -   -   -   -   -
host       memtest                                
host       is-numa-balancing-active               
host       is-swap-on                             
host       lsblk                                  
host       system-release                         
host       os-release                             
host       lsb-release                -   -   -   -   -
host       redhat-release                         
host       lscpu                                  
host       ifconfig                               
host       ip_rule                                
host       weka_local_status                      
weka_host  container_list                         
weka_host  lstopo                                 
weka_host  numactl                                
weka_host  ipmi-sel                               
weka_host  ipmi-sdr                               
weka_host  logs                                   
weka_host  driver_queue                           
weka_host  local_events                           
weka_host  traces_analysis                        
weka_host  resources_files                        
weka_host  core_dumps                             
cluster    api-status                 
cluster    api-alerts                 
cluster    api-realtime_stats         
cluster    api-hosts                  
cluster    api-nodes                  
cluster    api-drives                 
cluster    api-failure_domains        
cluster    api-host_hardware          
cluster    api-filesystems            
cluster    api-filesystem_groups      
cluster    api-snapshots              
cluster    api-tiering                
cluster    api-users                  
cluster    api-default_data_net       
cluster    api-nfs_client_groups      
cluster    api-nfs_interface_groups   
cluster    api-nfs_permissions        
cluster    api-nfs_status             
cluster    cli-status                 
cluster    cli-filesystems            
cluster    cli-host                   
cluster    cli-net                    
cluster    cli-nodes                  
cluster    cli-drives                 
cluster    cli-buckets                
cluster    cli-cluster-tasks          
cluster    cli-alerts                 
cluster    cli-rebuild_status         
cluster    cli-filesystem_groups      
cluster    cli-kms                    
cluster    cli-fs_tier_s3             
cluster    cli-org                    
cluster    cli-realtime_stats         
cluster    cli-smb                    
cluster    cli-smb-cluster-status     
cluster    cli-smb-domain             
cluster    cli-smb-share              
cluster    cli-smb-share-lists-show   
cluster    cli-cloud                  
cluster    cli-buckets-dist           
cluster    cli-net-links              
cluster    cli-blacklist-list         
cluster    cli-manual-overrides-list  
cluster    cli-traces-status          
cluster    cli-traces-freeze          
cluster    cli-s3-cluster             
cluster    cli-s3-cluster-status      
cluster    cli-s3-bucket-list         -
cluster    cli-config-list-overrides  
cluster    api-cfgdump                
report     summary                                
report     errors                                 

The following errors were found:

   h0:
      network_manager:
         - The NetworkManager service is running on this host, it must be turned off for weka to run properly

   h1:
      network_manager:
         - The NetworkManager service is running on this host, it must be turned off for weka to run properly

   h2:
      network_manager:
         - The NetworkManager service is running on this host, it must be turned off for weka to run properly

   h3:
      network_manager:
         - The NetworkManager service is running on this host, it must be turned off for weka to run properly

   h4:
      network_manager:
         - The NetworkManager service is running on this host, it must be turned off for weka to run properly

Parameters

NameDescriptionDefault

id

An optional identifier for this diagnostics file. If not specified, a random ID is generated.

Auto-generated

output-dir

The directory for saving the diagnostics file.

/opt/weka/diags

core-limit

Limit the diagnostics collection process to use the specified core number.

1

container-id

A list of container ID numbers separated by commas for collecting diagnostics data. If specified, the --backends and --clients options are ignored.

clients

Collect diagnostics data only from client containers.

No data is collected for clients

backends

Collect diagnostics data only from backend containers (same as if you are not specifying this option). To collect diagnostics for all client and backend containers, add both options --backends and --clients to the command.

Backends only

tar

Package the collected diagnostics in a TAR file.

No TAR file is created

Clean up the diagnostics files

The weka diags collect command consolidates diagnostics from various containers into a single dump directory. Conversely, weka diags upload saves diagnostics data on each container, distributing files across the cluster before uploading to WEKA Home.

Collecting diagnostics data generates individual files, consuming disk space. As a result, the system may accumulate numerous diagnostics files, especially after they have been uploaded to WEKA Home. To optimize disk space usage, perform a cleanup either on a specific diagnostics file or an entire directory containing multiple diagnostics files.

Cleanup procedure:

  1. List diagnostics files: List the diagnostics files in the system, including their corresponding IDs. This step provides an overview of the available diagnostic files.

  2. Delete specific diagnostic files: Delete specific diagnostics files based on their IDs. This targeted cleanup helps efficiently manage disk space and ensures the removal of unnecessary diagnostic data.

The diagnostics files are essential for troubleshooting purposes. Delete these files only if you are certain they have been successfully uploaded to WEKA Home and are no longer needed. For further clarification, contact the Customer Success Team.

List diagnostics files

Command: weka diags list

Use the following command to list the collected diagnostics files:

weka diags list [--verbose] [<id>]...

Parameters

NameDescription

id

The diagnostics file's ID or the path to the diagnostics file. If not specified, a list of all collected diagnostics files is displayed.

verbose

Displays the results of all the diagnostics files, including the successful ones.

Delete specific diagnostic files

Command: weka diags rm

Use the following command to stop a running diagnostics instance, cancel its upload, and delete it from the disk:

weka diags rm [--all] [<id>]...

Parameters

NameDescription

all

A flag to delete all the diagnostics files.

id*

The diagnostics file's ID or the path to the diagnostics files. If not specified, a list of all collected diagnostics files is displayed. This string is required unless the all option is specified.

Local server diagnostics command

Collecting diagnostics data from a connected local server is valuable in various scenarios, such as:

  • Lack of a functional management process in the originating backend container or the specified backend containers.

  • Absence of connectivity between the management process and the cluster leader.

  • The cluster lacking a leader.

  • The local container is offline.

  • The server cannot establish communication with the leader or encountering a failure when attempting the weka diags command.

Command: weka local diags

Use the following command to to collect diagnostics from a connected local server:

weka local diags [--id id] [--output-dir output-dir] [--core-dump-limit core-dump-limit] [--collect-cluster-info] [--tar]

Parameters

NameDescriptionDefault

id

A unique identifier for this diagnostics file.

Auto-generated

output-dir

The directory for saving the diagnostics file.

/opt/weka/diags

core-dump-limit

Limit the diagnostics collection process to use the specified core number.

1

collect-cluster-info

Collect diagnostics data related to the cluster. To prevent excessive load on the cluster, use this flag for one server at a time.

tar

Package the collected diagnostics data in a TAR file.

No TAR file is created