Drives sharing
Share access to physical NVMe devices among multiple Drive IO processes through a centralized SSD Proxy process by dividing a single physical drives into multiple virtual drives.
Overview
Drives sharing enables multiple Drive IO processes on a single host to access the same physical NVMe devices using a centralized SSD Proxy container. This improves resource efficiency by dividing a single physical drive into multiple virtual drives.
Benefits
Improved data reduction performance: Multiple Drive I/O processes can access the same device, increasing CPU parallelism for compression and decompression and improving effective IOPS utilization.
Better Gen5 bandwidth utilization: Gen5 NVMe drives can exceed 13 GB/s, while per-node network bandwidth is typically lower. Drives sharing enables multiple cores to drive a single device, helping saturate available bandwidth.
Multi-tenancy and small-cluster scalability: Sharing drives across containers or clusters enables smaller, granular capacity allocations, supporting layouts such as 16+2 even in smaller environments.
Flexible capacity management: Virtual drives can be created at arbitrary sizes, with support for over-provisioning. The total allocated virtual capacity can exceed the physical device capacity, enabling future growth planning.
Key concepts
Physical drive: A physical NVMe device.
Virtual drive: A logical drive carved from a physical drive and presented to the cluster as an independent disk. Each physical drive supports up to 64 virtual drives.
Physical UUID: The unique identifier written to the physical drive.
Virtual UUID (VID/vUUID): The unique identifier assigned to a virtual drive.
SSD Proxy: A dedicated container that owns physical NVMe devices and manages I/O routing and hardware queue allocation for virtual drives.
Hardware queues: NVMe I/O queue pairs allocated per virtual drive. By default, each VID uses four queues, with automatic fallback to one queue when hardware limits are reached.
weka-sign-drive: The command-line utility used to sign physical drives and create or manage virtual drives.
Configure drives sharing
Configure the SSD Proxy and virtual drives for shared NVMe access.
Before you begin
Identify the device paths (for example,
/dev/nvme0n1) for the NVMe drives.Obtain the cluster GUID.
To get the cluster GUID, run:
Procedure
Sign the physical drive: For each physical NVMe drive, write the WEKA header and register the device to the SSD Proxy (after this step, the drive is proxy-managed only). Replace
/dev/nvmeXnYwith the actual device path.
This is the only step that uses the device path (/dev/nvmeXnY). All subsequent operations reference drives using the UUIDs obtained from the weka-sign-drive list command.
Identify the physical UUID: List drives and copy the physical UUID.
Example output:
Create virtual drives (VIDs): Create one or more virtual drives on the signed physical device. This operation writes VID metadata to the drive header and does not require the SSD Proxy to be running.
Parameters
<physical-uuid>
Physical UUID identified in the previous step.
--size
Virtual drive size, in GB. Over-provisioning is supported.
--owner-cluster-guid
Cluster GUID used when signing the physical drive. Default: all clusters if not specified.
--virtual-uuid
Optional. Explicitly assigns a virtual UUID. If omitted, WEKA generates one automatically.
Example: Create three 1 TB virtual drives on the same physical drive.
If the device is accessible through the kernel, the command writes directly to the device. If the device is managed by the SSD Proxy, the operation is transparently routed through the proxy using JRPC.
Capacity is not enforced. The total size of virtual drives can exceed the physical drive capacity. Capacity planning and enforcement are the operator’s responsibility.
Verify the virtual drive configuration: List all signed drives and VID counts.
Example output
To see detailed information about each VID, use the show command:
Configure the SSD Proxy container: Configure the SSD Proxy container with resources that match the expected storage footprint.
Parameters
--max-drives
Maximum number of physical drives managed by the proxy (up to 40).
--expected-max-drive-size-gb
Largest expected drive size in GB. Used to calculate memory allocation for ChunkDB metadata.
--memory
If not using the max-drives and expected-max-drive-size-gb, specify the exact container memory.
Memory sizing guidance
The SSD Proxy allocates memory for ChunkDB metadata approximately based on the specified max-drives and expected-max-drive-size-gb parameters:
Example
For 10 drives × 30 TB each (300 TB total):
This ensures sufficient memory is reserved for managing virtual drives efficiently.
Start the SSD Proxy container: Start the container and confirm that the
ssdproxycontainer reportsRunning(otherwise, the following step will fail).
Add virtual drives to the cluster: Add each VID by using the standard drive add command.
Parameters
<container-id>
ID of the container to attach the virtual drive.
<virtual-uuid>
Virtual UUID created in the previous step.
--pool
Optional. Target storage pool (for example, iubig for large indirection unit pools).
Examples
All standard weka cluster drive add flags, including protection scheme options, apply to virtual drives.
System limits and planning guidelines
The following provides guidelines for planning SSD Proxy resources and ensures virtual drive performance remains optimal.
Physical drives per proxy
Maximum: 40 drives
Virtual drives per physical drive
Maximum: 64 VIDs per physical drive
Capacity
VID sizes can exceed physical drive capacity (over-provisioning allowed. Capacity management is the operator’s responsibility)
Minimum VID size is 1 GB. However, the capacity ratio of SSD in the cluster must be 8:1 or less between the smallest and largest SSDs.
Capacity planning
Example configuration:
10 physical drives × 30 TB each → 300 TB raw capacity
5 VIDs per drive × 5 TB each → 250 TB allocated
Remaining: 50 TB reserved for future growth
Memory requirements:
ChunkDB memory: ~4 MB per TB
300 TB → 1.2 GB
DPDK memory: Based on the number of VIDs
50 VIDs × 4 qpairs = 200 qpairs
Memory ≈ (200 × 540 KB) / 1,024 ≈ 105 MB
Default allocation of 1.32 GB is sufficient in this scenario
Base overhead: ~100 MB
Total estimated memory:
Manage drives sharing
Identify drive information
Display detailed or filtered information about physical and virtual drives.
Before you begin
Ensure the weka-sign-drive tool is available on the server.
Procedure
View detailed drive properties: Display details for a device.
Example output
List all drives: Display a summary of all proxy-managed drives.
Filter by proxy-managed drives: Display only drives accessible through SSD Proxy.
Filter by local drives: Display only drives visible to the local kernel.
Delete a virtual drive
Delete a virtual drive (VID) from a physical NVMe device.
Before you begin
Verify that the virtual drive is no longer in use by the cluster. If the VID is currently attached, deactivate and remove it at the cluster level before deleting the VID from the hardware.
Procedure
Deactivate the drive: Prevent new I/O operations.
Verify removal readiness: Wait for a removable state.
Remove the drive from the cluster: Detach the drive from the cluster configuration.
Delete the virtual drive: Remove the VID from the physical drive header.
Example
Advanced operations
Using JRPC for advanced operations
Advanced users can interact programmatically with the SSD Proxy using JRPC over the Unix domain socket. The wapi CLI tool provides direct access to these proxy functions.
<function-name>corresponds to a JRPC function in the SSD Proxy API.Flags and parameters depend on the selected function.
Example 1: List all drives
Example 2: Add a virtual drive via the SSD Proxy
Commands return structured JSON, which can be parsed using tools like jq.
This interface is intended for programmatic management, automation, or troubleshooting beyond standard CLI workflows.
Troubleshooting commands
Use the following commands to verify SSD Proxy and virtual drive health, resource allocation, and connectivity.
Check SSD Proxy resource allocation
View detailed drive and hardware queue information
Check SSD Proxy logs
Verify DPDK hugepage allocation
Test JRPC connectivity to the proxy
Check hardware queue limits for NVMe devices
This set of commands helps identify configuration, performance, and connectivity issues related to SSD Proxy and virtual drives.
Troubleshoot drives sharing common issues
SSD Proxy reports insufficient ChunkDB memory
Cause
The SSD Proxy allocates approximately 4 MB of RAM per TB of total managed storage. Configuring more or larger drives than planned can exhaust ChunkDB memory.
Resolution
Reconfigure the SSD Proxy with the correct drive count and expected sizes. The following command automatically calculates and allocates sufficient memory based on the total expected storage.
Cannot allocate additional virtual drives due to queue limits
Cause
NVMe hardware queue limits have been reached.
Each virtual drive (VID) is allocated 4 hardware queues by default. NVMe devices have a limited number of hardware queues (typically 128–256). If the total required queues exceed the device limit, new VIDs cannot be fully allocated.
Resolution
Check current queue allocation. The following command displays the Num Queues for each VID.
Verify queue availability. Ensure that:
Understand device capacity. Example: A device with 128 hardware queues can fully support up to 32 VIDs (32 × 4 = 128).
Manage additional VIDs. If you create more VIDs than the device can fully support, the proxy allocates fewer queues per VID (down to 1 queue minimum). Do one of the following:
Reduce the number of VIDs per physical drive.
Allow fallback allocation with fewer queues per VID.
VID allocation fails due to insufficient DPDK memory
The SSD Proxy relies on DPDK/SPDK for high-performance I/O. Each queue pair (qpair) requires approximately 540 KB of memory.
Default allocation
~1.32 GB of DPDK memory supports up to 640 VIDs (2,560 qpairs at 4 queues per VID).
Allocating more VIDs requires increasing DPDK memory.
Symptoms
VID creation fails with memory errors.
Proxy logs show
"DPDK allocation failed"or hugepage exhaustion.weka local psindicates the proxy is in a degraded state.New qpairs cannot be allocated, even if hardware queues are available.
Resolution
Increase the DPDK memory allocation for the SSD Proxy:
Estimating required memory
Memory per VID = 4 queues × 540 KB ≈ 2.1 MB
Total memory (MB) ≈ (Number of VIDs × 4 × 540) / 1024
Examples
~640 VIDs: Default 1.32 GB sufficient
~1,000 VIDs: Required memory ≈ 2,109 MB
Maximum VIDs (40 drives × 64 VIDs = 2,560 VIDs): Required memory ≈ 5,400 MB
This ensures sufficient memory for queue pair allocation and prevents DPDK-related errors during virtual drive operations.
Last updated