Troubleshooting

This page details common errors that can occur when deploying WekaIO in AWS using CloudFormation and what can be done to resolve them.

Introduction

Using CloudFormation deployment saves a lot of potential errors that may occur during installation, especially in the configuration of security groups and other connectivity-related issues. However, the following errors related to the following subjects may occur during installation:

  • Installation logs

  • AWS account limits

  • AWS instance launch error

  • Launch in placement group error

  • Instance type not supported in AZ

  • ClusterBootCondition timeout

  • Clients failed to join cluster

Installation Logs

As explained in Self-Service Installation, each instance launched in a WekaIO CloudFormation template starts by installing WekaIO on itself. This is performed using a script named wekaio-instance-boot.sh and launched by cloud-init. All logs generated by this script are written to the instance’s Syslog.

Additionally, the CloudWatch Logs Agent is installed on each instance, dumping Syslog to CloudWatch under a log-group named/wekaio/<stack-name>. For example, if the stack is namedcluster1, a log-group named /wekaio/cluster1 should appear in CloudWatch a few moments after the template shows the instances have reached CREATE_COMPLETE state.

Under the log-group, there should be a log-stream for each instance Syslog matching the instance name in the CloudFormation template. For example, in a cluster with 6 backend instances, log-streams named Backend0-syslog through Backend5-syslog should be observed.

AWS Account Limits

When deploying the stack, this error may be received in the description of a CREATE_FAILED event for one or more instances, indicating that more instances (N) have been requested than that permitted by the current instance limit of L for the specified instance type. To request an adjustment to this limit, go to aws.amazon.com to open a support case with AWS.

AWS Instance Launch Error

If the error Instance i-0a41ba7327062338e failed to stabilize. Current state: shutting-down. Reason: Server.InternalError: Internal error on launch is received, one of the instances was unable to start. This is an internal AWS error and it is necessary to try to deploy the stack again.

Launch in Placement Group Error

If the error We currently do not have sufficient capacity to launch all of the additional requested instances into Placement Group 'PG' is received, it was not possible to place all the requested instances in one placement-group.

The CloudFormation template creates all instances in one placement-group to guarantee best performance. Consequently, if the deployment fails with this error, try to deploy in another AZ.

Instance Type Not Supported In AZ

If the error The requested configuration is currently not supported. Please check the documentation for supported configurations or Your requested instance type (T) is not supported in your requested Availability Zone (AZ). Please retry your request by not specifying an Availability Zone or choosing az1, az2, az3 is received, the instance type that you tried to provision is not supported in the specified AZ. Try selecting another subnet to deploy the cluster in, which will implicitly select another AZ.

ClusterBootCondition Timeout

When a ClusterBootCondition timeout occurs, there was a problem creating the initial Weka system cluster. To debug this error, look in the Backend0-syslog log-stream (as described above). The first backend instance is responsible for creating the cluster and therefore, its log should provide the information necessary to debug this error.

Clients Failed to Join

When the message Clients failed to join for uniqueId: ClientN is received while in the WaitCondition, one of the clients was unable to join the cluster. Look at the Syslog of the client specified in uniqueId as described above.

For Example: If the error message specifies that client 3 failed to join, a message ending with uniqueId: Client3 should be displayed. Look at the log-stream named Client3-syslog.