Detailed deployment tutorial: WEKA on GCP using Terraform
Last updated
Last updated
Deploying WEKA on GCP requires proficiency in several technologies, including GCP, , basic Linux operations, and the WEKA software itself. Recognizing that not all individuals responsible for this deployment are experts in each of these areas, this document aims to provide comprehensive, end-to-end instructions. This ensures that readers with minimal prior knowledge can successfully deploy a functional WEKA cluster on GCP.
This document specifically addresses the deployment of WEKA in a GCP environment using Terraform, applicable for both proof-of-concept (POC) and production settings. While no pre-existing GCP elements are necessary beyond an appropriate user account, the guide demonstrates the use of some pre-existing components, as many environments already have these in place.
The reader is guided through:
General GCP requirements.
Networking requirements to support WEKA.
Deployment of WEKA using Terraform.
Verification of a successful WEKA deployment.
HashiCorp Terraform is a powerful tool that allows you to define, provision, and manage infrastructure as code. You can specify your infrastructure setup in a configuration file using a declarative configuration language, HashiCorp Configuration Language (HCL), or JSON. Terraform then uses this file to automatically create, modify, or delete resources, ensuring consistent and predictable deployment of your infrastructure components such as servers, databases, and networks.
This document outlines the process for automating the deployment of the WEKA Data Platform on Google Cloud Platform (GCP) using Terraform. Terraform's widespread adoption and prominence in the Infrastructure as Code (IaC) domain drive its choice. Organizations of all sizes globally leverage Terraform to deploy persistent infrastructure both on-premises and across public clouds like AWS, Azure, and Google Cloud Platform.
To install Terraform, we recommend following the provided by HashiCorp. Additionally, Terraform can be run directly from the GCP Cloud Terminal, which comes with Terraform pre-installed, as illustrated in this guide.
It is essential for the customer to understand their subscription structure for deployments within a WEKA customer environment. If you are deploying internally at WEKA and cannot locate an Account ID or have not been added to the appropriate account, contact the relevant cloud team for assistance.
Verify GCP IAM user permissions
Navigate to the GCP Management Console.
Log in using the account intended for the WEKA deployment.
In the GCP Console, go to the Services menu and select IAM to access the Identity and Access Management dashboard.
Within the IAM dashboard, locate the relevant IAM user by searching for their account.
Click on the user's Security insights to review their permissions.
For a successful WEKA deployment on GCP using Terraform, ensure your GCP project has the necessary quotas for the required resources. When setting up compute instances, such as the c2 type for the WEKA backend cluster, manage quotas based on the CPU count for each compute instance type or family.
Before deploying WEKA, confirm that your compute instance's CPU sizing requirements (determined in partnership with WEKA) can be met within the existing quota limits. If not, increase the quotas in your GCP project before executing the Terraform commands detailed later in this document.
The required minimum quota is the total CPU count for all instances (for example, deploying 10 c2.standard-8 instances requires 80 CPUs just for the cluster). Ensuring sufficient quotas prevents failures during the execution of Terraform commands, as discussed in subsequent sections.
On the Quotas page, search for CPU and select the compute instance type family, in this case, c2.
Locate the region where you intend to deploy WEKA and confirm that there are sufficient available CPUs of the specified family type. If not, adjust the quota accordingly.
The WEKA deployment incorporates several GCP components, including VPCs, Subnets, Security Groups, Endpoints, and others. These elements can be either generated by the WEKA Terraform modules or exist beforehand if manually creating components to use.
Four VPCs (Virtual Private Clouds), each with at least one Subnet and a Security Group, are required at minimum. This guide assumes that Terraform generates these items for WEKA within the environment.
The Terraform deployment can automatically establish VPC peering connections from the new VPCs to your current VPC, which is utilized by the application servers consuming WEKA storage. Considering how GCP handles compute instances with multiple vNICs, it is advisable to allow Terraform to create the required networking for WEKA.
This guide assumes an already deployed VPC and the necessity of adding four WEKA-specific VPCs. This requirement arises from GCP networking constraints, where each VM instance can only have one vNIC per VPC. However, WEKA mandates a minimum of four vNICs per instance. Ensure that you have the CIDR information for the four subnets created in the new VPCs to prevent conflicts.
The WEKA Terraform modules establish peering connections between the newly created VPCs for WEKA and an existing VPC within the environment. Therefore, the initial networking step involves identifying the VPC to peer with.
VPC
Subnet (in VPC)
The WEKA user token provides access to the WEKA binaries and is used to access get.weka.io during installation.
Select the user's name located in the upper right-hand corner of the page.
From the column on the left-hand side of the page, select API Tokens. The user’s API token is displayed. Note it for using it later in the installation process.
The following demonstrates deploying WEKA into Virtual Private Clouds (VPCs) and Subnets without exposing the instances to Internet access.
IAM roles, networks, and security groups: These modules create the necessary IAM roles, networks, and security groups for WEKA deployment. If specific IDs for security groups or subnets are not provided, the modules generate them automatically.
Service account: This module automatically creates a service account that the WEKA deployment functions and services use.
Network: To deploy a private network with Network Address Translation (NAT), certain variables need to be set, such as create_nat_gateway to true and providing a private CIDR range. To prevent instances from obtaining public IPs, set assign_public_ip to false.
Shared_VPCs and VPC_Peering: These modules handle the peering of VPCs to each other and existing VPCs if provided.
Clients (optional): This module automatically mounts clients to the WEKA cluster. Users can specify the number of clients to create along with optional variables such as instance type, number of network interfaces (NICs), and AMI ID.
Protocol_Gateways NFS/SMB (optional): Similar to the client's module, this module allows users to specify the number of protocol gateways per protocol. Additional configuration details, such as instance type and disk size, can also be provided.
Worker_Pool: This module creates a private pool to build GCP cloud functions.
Sign in to Google Cloud Platform and access the Cloud Shell.
If the Terminal is not associated with the project intended for WEKA deployment, close it, switch to the correct project, and reopen it.
Create a directory specifically for the Terraform configuration files. To maintain state information, it's essential to keep each Terraform deployment separate. Other deployments can be executed by duplicating these instructions and naming the directory appropriately, such as deploy1 or another unique identifier.
Navigate to the created directory.
To display accessible output data on the screen during the process, create an output.tf file in the deploy directory, and insert the following content:
Save the output.tf file.
Define the Terraform options by creating the main.tf file using your preferred text editor. Use the following template and replace the placeholders < >
with the values specific to your deployment environment:
After creating and saving the main.tf file, execute the following command in the same directory. This ensures that the required Terraform resource files from GCP are downloaded and accessible to the system.
Before applying or destroying a Terraform configuration file, it's recommended to run the following:
If GCP requires authentication, grant permission accordingly.
To initiate the deployment of WEKA in GCP, run the following command:
This command initiates the creation of GCP resources essential for WEKA. When prompted, confirm the deployment of resources by typing yes.
Upon completing the Terraform GCP resource deployment process, a summary of the outcome is displayed. In the event of an unsuccessful deployment, an error message indicating failure is shown instead.
The output includes several other commands to allow you to view information or modify the deployment, in addition to a command to look up the WEKA admin password.
Get cluster status
Resize cluster command
Pre-terraform destroy, cluster terminate function
Get backend ips
Run the get cluster status
command to verify the state of the WEKA deployment.
Here is the output from the example
The section "io_status":"STARTED"
shows that the cluster is fully up and running and ready for access.
The Terraform deployment simplifies the process of deploying additional compute instances to serve as protocol servers (protocol gateways) for NFS or SMB. These protocol gateways are separate from the instances designated for the WEKA backend cluster.
To deploy protocol gateways, add more information to the main.tf
file.
The simplest approach is to specify the number of protocol gateways for each type (NFS and SMB) and use default settings for the other parameters. If you plan to distribute your NFS or SMB connections across multiple protocol servers manually, you can adjust these numbers according to your needs. For instance, if you have three projects, each requiring its own bandwidth for NFS, you can deploy three protocol gateways. Assign each project the IP address or DNS entry of its respective gateway to use as the NFS server.
Add the following before the last ‘}’
of the main.tf file.
To obtain the IP addresses of your WEKA cluster, follow these steps:
Visit the GCP Compute Engine VM instances dashboard.
Identify the WEKA backend servers. The instance names follow the format: <cluster_name>-<Timestamp>
, where <cluster_name>
corresponds to the value specified in the main.tf
file.
Select any WEKA backend instance and note the IP address of nic0
.
This IP address is used if your subnet assigns a public IP address to the instance (that is if the VM instance is configured accordingly). WEKA uses only private IPv4 addresses for all interface IP addresses for communication.
The WEKA cluster password is securely stored in the Google Cloud Platform (GCP) Secret Manager. You can retrieve it using the gcloud
command from the Terraform output or through the GCP console. Follow these steps to access the password through the GCP console:
Open the GCP console and search for Secret Manager.
Navigate to the Secrets section within the Secret Manager.
Locate and select the secret named weka_<cluster_name>_password
corresponding to your deployment.
Select the Actions option and select View secret value.
The system displays the randomly generated password assigned to the WEKA user admin.
You can access the WEKA cluster backend instances through SSH directly from the GCP browser window. This method allows you to run WEKA CLI commands and gather logs as needed.
Follow these steps to connect to the backend instances:
Open the GCP console.
Navigate to the Compute Engine section.
Select the instance you wish to access.
Select the SSH button to open a browser-based SSH session.
To access the WEKA GUI, use a jump host with a GUI deployed within the same VPC and subnet as the WEKA cluster.
In the following examples, a Windows 10 instance with a public IP address is deployed in the same VPC, subnet, and security group as the WEKA cluster. The network security group rules are added to allow RDP explicit access to the Windows 10 system.
Open a browser in the Windows 10 jump box.
Visit https://<cluster-backend-ip>:14000. The WEKA GUI sign-in screen appears.
Sign in as user admin and use the password retrieved earlier ( see Retrieve the WEKA cluster access password).
View the cluster GUI home screen.
Review the cluster backends. Check the status and details of the backend instances.
Review the clients, if any, attached to the cluster.
Review the filesystems.
The WEKA backend cluster can be scaled out and scaled in using an API call through the GCP Cloud terminal.
The WEKA backend cluster can be dynamically scaled out and scaled in using API calls through the GCP Cloud terminal. The process is managed by Terraform-created functions that automatically trigger when a new instance is initiated or retired. These functions execute the necessary automation processes to adjust the cluster's computing resources.
To scale out from the initial deployment, use the CLI command provided in the Terraform output.
Replace unhealthy instances:
Auto Scaling automatically initiates the replacement of instances that fail health checks. It launches new instances and incorporates them into the WEKA cluster, ensuring continuous availability and responsiveness.
This process mitigates the impact of instance failures by promptly integrating the new instance into the cluster and service.
Graceful scaling:
Auto-scaling configurations can be adjusted to perform scaling actions gradually. This prevents sudden spikes in traffic and minimizes application disruptions.
This measured approach maintains a balanced and stable environment, effectively adapting to changes in demand without causing abrupt changes.
To validate the self-healing functionality of the WEKA cluster, you can decommission an old instance and allow the Auto Heal feature to launch a new one. Follow this brief guide:
Identify the old instance: Locate the GCP VM instance you want to decommission. This can be based on factors such as age, outdated configurations, or other criteria.
Terminate the old instance: Manually terminate the identified GCP VM instance using the GCP Management Console, gcloud CLI, or SDKs. This action triggers the Auto Heal process.
Verify the new instance: Ensure the new instance is successfully launched, passes the health checks, and joins the cluster. Confirm that the cluster's overall capacity remains unchanged.
Document and monitor: Record the decommissioning process and monitor the cluster to ensure it continues to operate smoothly with the new instance in place.
The Compute Engine and Workflows API services must be enabled to allow the following services:
artifactregistry.googleapis.com
cloudbuild.googleapis.com
cloudfunctions.googleapis.com
cloudresourcemanager.googleapis.com
cloudscheduler.googleapis.com
compute.googleapis.com
dns.googleapis.com
eventarc.googleapis.com
iam.googleapis.com
secretmanager.googleapis.com
servicenetworking.googleapis.com
serviceusage.googleapis.com
vpcaccess.googleapis.com
workflows.googleapis.com
The user running the Terraform module requires the following roles to run the terraform apply
:
roles/cloudfunctions.admin
roles/cloudscheduler.admin
roles/compute.admin
roles/compute.networkAdmin
roles/compute.serviceAgent
roles/dns.admin
roles/iam.serviceAccountAdmin
roles/iam.serviceAccountUser
roles/pubsub.editor
roles/resourcemanager.projectIamAdmin
roles/secretmanager.admin
roles/servicenetworking.networksAdmin
roles/storage.admin
roles/vpcaccess.adminroles/workflows.admin
Ensure that the GCP IAM user has the permissions outlined in to perform the necessary operations for a successful WEKA deployment on GCP using Terraform. The IAM user must be able to create, modify, and delete GCP resources as specified by the Terraform configuration files used in the WEKA deployment.
If the current IAM user lacks the permissions detailed in , either update the user's permissions or create a new IAM user with the required privileges.
Ensure that the user possesses the permissions listed in , which are necessary for managing GCP resources through Terraform.
While this user has full administrative access to enable Terraform to deploy WEKA, it is recommended to follow the principle of . Grant only the specific permissions outlined in Appendix A to ensure security best practices.
Navigate to the and search for the service Quotas & System Limits.
In a web browser, navigate to .
The is designed for deploying various GCP resources essential for WEKA deployment on GCP, including Compute instances, cloud functions, and VPCs.