This page details common errors that can occur when deploying WEKA in AWS using CloudFormation and what can be done to resolve them.
Using CloudFormation deployment saves a lot of potential errors that may occur during installation, especially in the configuration of security groups and other connectivity-related issues. However, the following errors related to the following subjects may occur during installation:
- Installation logs
- AWS account limits
- AWS instance launch error
- Launch in placement group error
- Instance type not supported in AZ
- ClusterBootCondition timeout
- Clients failed to join cluster
As explained in Self-Service Installation, each instance launched in a WEKA CloudFormation template starts by installing WEKA on itself. This is performed using a script named
wekaio-instance-boot.shand launched by cloud-init. All logs generated by this script are written to the instance’s Syslog.
Additionally, the CloudWatch Logs Agent is installed on each instance, dumping Syslog to CloudWatch under a log-group named
/wekaio/<stack-name>. For example, if the stack is named
cluster1,a log-group named
/wekaio/cluster1should appear in CloudWatch a few moments after the template shows the instances have reached CREATE_COMPLETE state.
Under the log-group, there should be a log-stream for each instance Syslog matching the instance name in the CloudFormation template. For example, in a cluster with 6 backend instances, log-streams named
Backend5-syslogshould be observed.
When deploying the stack, this error may be received in the description of a CREATE_FAILED event for one or more instances, indicating that more instances (N) have been requested than that permitted by the current instance limit of L for the specified instance type. To request an adjustment to this limit, go to aws.amazon.com to open a support case with AWS.
If the error Instance i-0a41ba7327062338e failed to stabilize. Current state: shutting-down. Reason: Server.InternalError: Internal error on launch is received, one of the instances was unable to start. This is an internal AWS error and it is necessary to try to deploy the stack again.
If the error We currently do not have sufficient capacity to launch all of the additional requested instances into Placement Group 'PG' is received, it was not possible to place all the requested instances in one placement-group.
The CloudFormation template creates all instances in one placement-group to guarantee best performance. Consequently, if the deployment fails with this error, try to deploy in another AZ.
If the error The requested configuration is currently not supported. Please check the documentation for supported configurations or Your requested instance type (T) is not supported in your requested Availability Zone (AZ). Please retry your request by not specifying an Availability Zone or choosing az1, az2, az3 is received, the instance type that you tried to provision is not supported in the specified AZ. Try selecting another subnet to deploy the cluster in, which will implicitly select another AZ.
When a ClusterBootCondition timeout occurs, there was a problem creating the initial WEKA system cluster. To debug this error, look in the
Backend0-sysloglog-stream (as described above). The first backend instance is responsible for creating the cluster and therefore, its log should provide the information necessary to debug this error.
When the message Clients failed to join for uniqueId: ClientN is received while in the WaitCondition, one of the clients was unable to join the cluster. Look at the Syslog of the client specified in uniqueId as described above.
Example: If the error message specifies that client 3 failed to join, a message ending with
uniqueId: Client3should be displayed. Look at the log-stream named