S3

This page describes the Weka implementation of the S3 protocol.

Overview

The S3 protocol is widely used and spans many cloud-native or cloud-ready applications.

With Weka, you can:

  • Ingest data with S3 and then you can access the data with either S3 or other protocols.

  • Expose existing data to S3, and migrate your application within the same data platform.

  • Burst to the cloud and use new applications without the need to migrate your data.

In general, you can both gradually move applications to S3 and access the same data via multiple protocols (POSIX, S3, SMB, NFS, GPUDirect Storage). All this while enjoying Weka's scale, performance, and resiliency.

Architecture

The Weka S3 service is a scalable, resilient service that provides multi-protocol access to data.

The S3 service is implemented by specifying a set of storage hosts that you want to run the S3 protocol on and then creating a logical S3 cluster to expose the S3 service. As you define many hosts that serve the S3 protocol the S3 cluster scales to higher performance.

By integrating a round-robin DNS or a load balancer, different S3 clients will access different hosts, allowing the Weka system to scale and service thousands of clients.

The Weka S3 service works on top of the WekaFS file service. Buckets are mapped to (top-level) directories, and objects are mapped to files. Then, the same data can be exposed with either of the Weka-supported protocols.

S3 Access, Security and Auditing

S3 Access

Access to S3 APIs can be either authenticated or anonymous.

User Authentication

The process of gaining authenticated S3 access requires to:

  1. Create and attach an IAM policy for that S3 user, to set the permissions of the user to S3 operations and resources

Anonymous Access

Anonymous access to buckets/objects can be obtained by either:

Security

Encryption at Rest

Data written via the S3 protocol can be encrypted at-rest by setting an encrypted filesystem.

TLS

Clients' access to the service via HTTPS is provided using the same certificates Weka uses for other API access, as defined in the TLS section.

Audit

The S3 API calls can be audited using an HTTP webhook service and connecting to an application such as Splunk.

To set an audit target, use the weka s3 cluster audit-webhook enable CLI command.

For more information, refer to the Audit S3 APIs page.