Set up a Data Services container for background tasks

Efficiently manage resource-intensive tasks with at least one Data Services container for improved performance and reliability.

The Data Services container runs tasks in the background, particularly those that can be resource-intensive. It runs the Quota Coloring task and the S3 lifecycle management tasks.

Running these tasks in the background ensures your CLI remains accessible and responsive without consuming compute resources. This strategy enhances performance, efficiency, and scalability when managing quotas and S3 lifecycle rules. If a task is interrupted, it automatically resumes, providing reliability.

circle-exclamation

To improve data service performance, you can set up multiple Data Service containers, one per WEKA server.

After setting up the Data Service container, you can manage it like any other container within the cluster. If there’s a need to adjust its resources, use the weka cluster container resources or weka local resources commands. For more details, see Expand specific resources of a container.

Set up Data Services container

Before you begin

  1. Ensure the server where you’re adding this container has sufficient memory available:

    • 3.5 GB if no dedicated core is specified.

    • 5.5 GB if a dedicated core is specified.

  2. The Data Service containers require a persistent 22 GB filesystem for intermediate global configuration data. Do one of the following:

  3. Set the Data Service global configuration. Run the following command:

weka dataservice global-config set --config-fs <configuration filesystem name>

Example:

circle-info

By default, the Data Service containers share the core of the Management process. However, if you have enough resources, you can assign a separate core to it.

Procedure

  1. Set up the Data Services container: Run the following command:

Parameters:

Parameter
Description

name*

The Data Services container name. Set dataserv0 to avoid confusion.

only-dataserv-cores*

Creates a Data Services container. This parameter is mandatory.

base-port

If a base-port is not specified, the Data Services container may still initialize as it attempts to allocate an available port range and could succeed. However, for optimal operation, it is recommended to provide the base port externally.

join-ips*

Specify the management IP of one of the servers in the cluster to join.

management-ips

This is optional. If not provided, it automatically takes the management IP of the server.

memory

Configure the container memory to be allocated for huge pages. It is recommended to set it to 1.5 GB.

allow-mix-setting

This option enables using specified core IDs, even when containers with AUTO core ID allocation run on the same server. It is required if the core allocation is not explicitly specified.

chevron-rightExamplehashtag
  1. Verify the Data Services container is up: Run weka local ps.

chevron-rightExamplehashtag
  1. Verify the Data Services container is visible in the cluster: Run weka cluster container.

chevron-rightExamplehashtag

See dataserve0 in the last row (CONTAINER ID 15).

  1. Verify the data services and management processes have joined the cluster: Run weka cluster process.

chevron-rightExamplehashtag

See PROCESS IDs 300 and 301.

Set up S3 lifecycle task management

After setting up the Data Services container, you can enable and configure S3 lifecycle task management to automate object expiration in S3 buckets.

Enable S3 lifecycle task management

Run the following command to enable the S3 lifecycle task manager:

Configure S3 lifecycle task settings (optional)

You can customize the S3 lifecycle task manager behavior using the following command:

Parameters:

Parameter
Description

--max-tasks

Maximum number of concurrent S3 lifecycle tasks that can run simultaneously.

Default: 4

--interval

Interval between lifecycle task manager runs.

Accepts time format: 3s, 2h, 4m, 1d, 1d5h, 1w, infinite, or unlimited. Default: 60 seconds

Example: Set maximum concurrent tasks to 6 and interval to 5 minutes

View S3 lifecycle task configuration

To view the current S3 lifecycle task manager configuration, run:

Example output:

Disable S3 lifecycle task management

To disable the S3 lifecycle task manager, run:

circle-info

Disabling the task manager prevents new lifecycle tasks from being scheduled. Any currently running tasks will complete, but no new tasks will start until theS3 lifecycle task manager is re-enabled.

Last updated