W E K A
3.14
3.14
  • WEKA v3.14 Documentation
  • Weka System Overview
    • About the WEKA System
    • SSD Capacity Management
    • Filesystems, Object Stores & Filesystem Groups
    • Weka Networking
    • Data Lifecycle Management
    • Weka Client & Mount Modes
    • Glossary
  • Getting Started with Weka
    • Quick Install Guide
    • Managing the Weka System
    • CLI Overview
    • GUI Overview
    • Serving IOs with WekaFS
  • Planning & Installation
    • Prerequisites for Installation
    • Bare Metal Installation
      • Planning a Weka System Installation
      • Setting Up the Hosts
        • SR-IOV Enablement
      • Obtaining the Weka Install File
      • Weka System Installation Process Using the CLI
      • Adding Clients
    • AWS Installation
      • Self-Service Portal
      • CloudFormation Template Generator
      • Deployment Types
      • AWS Outposts Deployment
      • Supported EC2 Instance Types
      • Adding Clients
      • Auto Scaling Group
      • Troubleshooting
  • Performance
    • Testing Weka Performance
      • Test Environment Details
  • WekaFS Filesystems
    • Managing Filesystems, Object Stores & Filesystem Groups
      • Managing Object Stores
      • Managing Filesystem Groups
      • Managing Filesystems
      • Attaching/Detaching Object Stores to/from Filesystems
      • KMS Management
    • Advanced Data Lifecycle Management
      • Advanced Time-based Policies for Data Storage Location
      • Data Management in Tiered Filesystems
      • Transition Between Tiered and SSD-Only Filesystems
      • Manual fetch and release of data
    • Mounting Filesystems
    • Snapshots
    • Snap-To-Object
    • Quota Management
  • Additional Protocols
    • NFS
    • SMB
      • SMB Management Using CLIs
      • SMB Management Using the GUI
    • S3
      • S3 Cluster Management
      • S3 Buckets Management
      • S3 Users and Authentication
      • S3 Information Lifecycle Management
      • Audit S3 APIs
      • S3 Limitations
      • S3 Examples using boto3
  • Operation Guide
    • Alerts
      • List of Alerts
    • Events
      • List of Events
    • Statistics
      • List of Statistics
    • System Congestion
    • Security
      • User Management
      • Organizations
    • Expanding & Shrinking Cluster Resources
      • Expand & Shrink Overview
      • Stages in Adding a Backend Host
      • Expansion of Specific Resources
      • Shrinking a Cluster
    • Background Tasks
    • Upgrading Weka Versions
  • Billing & Licensing
    • License Overview
    • Classic License
    • Pay-As-You-Go License
  • Support
    • Prerequisites and Compatibility
    • Getting Support for Your Weka System
    • The Weka Support Cloud
    • Diagnostics CLI Command
  • Appendix
    • Weka CSI Plugin
    • External Monitoring
    • Snapshot Management
  • REST API
Powered by GitBook
On this page
  • Introduction
  • Installation Logs
  • AWS Account Limits
  • AWS Instance Launch Error
  • Launch in Placement Group Error
  • Instance Type Not Supported In AZ
  • ClusterBootCondition Timeout
  • Clients Failed to Join
  • Health Monitoring
  1. Planning & Installation
  2. AWS Installation

Troubleshooting

This page details common errors that can occur when deploying Weka in AWS using CloudFormation and what can be done to resolve them.

PreviousAuto Scaling GroupNextTesting Weka Performance

Last updated 3 years ago

Introduction

Using CloudFormation deployment saves a lot of potential errors that may occur during installation, especially in the configuration of security groups and other connectivity-related issues. However, the following errors related to the following subjects may occur during installation:

  • Installation logs

  • AWS account limits

  • AWS instance launch error

  • Launch in placement group error

  • Instance type not supported in AZ

  • ClusterBootCondition timeout

  • Clients failed to join cluster

Installation Logs

As explained in , each instance launched in a Weka CloudFormation template starts by installing Weka on itself. This is performed using a script named wekaio-instance-boot.sh and launched by cloud-init. All logs generated by this script are written to the instance’s Syslog.

Additionally, the is installed on each instance, dumping Syslog to CloudWatch under a log-group named/wekaio/<stack-name>. For example, if the stack is namedcluster1, a log-group named /wekaio/cluster1 should appear in CloudWatch a few moments after the template shows the instances have reached CREATE_COMPLETE state.

Under the log-group, there should be a log-stream for each instance Syslog matching the instance name in the CloudFormation template. For example, in a cluster with 6 backend instances, log-streams named Backend0-syslog through Backend5-syslog should be observed.

AWS Account Limits

AWS Instance Launch Error

If the error Instance i-0a41ba7327062338e failed to stabilize. Current state: shutting-down. Reason: Server.InternalError: Internal error on launch is received, one of the instances was unable to start. This is an internal AWS error and it is necessary to try to deploy the stack again.

Launch in Placement Group Error

If the error We currently do not have sufficient capacity to launch all of the additional requested instances into Placement Group 'PG' is received, it was not possible to place all the requested instances in one placement-group.

The CloudFormation template creates all instances in one placement-group to guarantee best performance. Consequently, if the deployment fails with this error, try to deploy in another AZ.

Instance Type Not Supported In AZ

If the error The requested configuration is currently not supported. Please check the documentation for supported configurations or Your requested instance type (T) is not supported in your requested Availability Zone (AZ). Please retry your request by not specifying an Availability Zone or choosing az1, az2, az3 is received, the instance type that you tried to provision is not supported in the specified AZ. Try selecting another subnet to deploy the cluster in, which will implicitly select another AZ.

ClusterBootCondition Timeout

When a ClusterBootCondition timeout occurs, there was a problem creating the initial Weka system cluster. To debug this error, look in the Backend0-syslog log-stream (as described above). The first backend instance is responsible for creating the cluster and therefore, its log should provide the information necessary to debug this error.

Clients Failed to Join

When the message Clients failed to join for uniqueId: ClientN is received while in the WaitCondition, one of the clients was unable to join the cluster. Look at the Syslog of the client specified in uniqueId as described above.

For Example: If the error message specifies that client 3 failed to join, a message ending with uniqueId: Client3 should be displayed. Look at the log-stream named Client3-syslog.

Health Monitoring

When deploying the stack, this error may be received in the description of a CREATE_FAILED event for one or more instances, indicating that more instances (N) have been requested than that permitted by the current instance limit of L for the specified instance type. To request an adjustment to this limit, go to to open a support case with AWS.

You can monitor the cluster instances by checking the cluster EC2 instances in the AWS EC2 service. You can set up as external monitoring to the cluster.

Connecting to the Weka system cluster GUI will provide the, where you can see if any component is not functioning well, view system , , and .

Self-Service Installation
CloudWatch Logs Agent
aws.amazon.com
Cloud Watch
Weka cluster health
alerts
events
statistics