Breaking the Speed Limit with WEKA: The World's Fastest File System on top of Azure Hot Blob

Abstract

Azure Blob Storage is engineered to manage immense volumes of unstructured data efficiently. While utilizing Blob Storage for High-Performance Computing (HPC) tasks presents numerous benefits, including scalability and cost-effectiveness, it also introduces specific challenges. Key among these challenges are data access latency and the potential for performance decline in workloads, particularly noticeable in compute-intensive or real-time applications when accessing data stored in Blob. In this article, we will examine how WEKA’s patented filesystem, WekaFS™, and its parallel processing algorithms accelerate Blob storage performance.

About WEKA

The WEKA® Data Platform was purpose-built to seamlessly and sustainably deliver speed, simplicity, and scale that meets the needs of modern enterprises and research organizations without compromise. Its advanced, software-defined architecture supports next-generation workloads in virtually any location with cloud simplicity and on-premises performance.

At the heart of the WEKA® Data Platform is a modern fully distributed parallel filesystem, WekaFS™ which can span across 1,000’s of NVMe SSD spread across multiple hosts and seamlessly extend itself over compatible object storage.

WEKA in Azure

Many organizations are leveraging Microsoft Azure to run their High-Performance Computing (HPC) applications at scale. As cloud infrastructure becomes integral, users expect the same performance as on-premises deployments. WEKA delivers unbeatable performance for your most demanding applications running in Microsoft Azure supporting high I/O, low latency, small files, and mixed workloads with zero tuning and automatic storage rebalancing.

WEKA software is deployed on a cluster of Microsoft Azure LSv3 VMs with local NVMe SSD to create a high-performance storage layer. WEKA can also take advantage of Azure Blob Storage to scale your namespace at the lowest cost. You can automate your WEKA deployment through HashiCorp Terraform templates for fast easy installation. Data stored with your WEKA environment is accessible to applications in your environment through multiple protocols, including NFS, SMB, POSIX, and S3-compliant applications.

Kent has written an excellent article on WEKA’s SMB performance for HPC Windows Grid Integration. For more, please see:

https://techcommunity.microsoft.com/t5/azure-high-performance-computing/scaling-up-in-the-cloud-the-Weka-data-platform-and-azure-hpc/ba-p/3997491

WEKA Architecture

WEKA is a fully distributed, parallel file system that was written entirely from the ground up to deliver the highest-performance file services designed for NVMe SSD. Unlike traditional parallel file systems which require extensive file system knowledge to deploy and manage, WEKA’s zero-tuning approach to storage allows for easy management from 10’s of terabytes to 100’s of petabytes in scale.

WEKA’s unique architecture in Microsoft Azure, as shown in Figure 1, provides parallel file access via POSIX, NFS, SMB and AKS. It provides a rich enterprise feature set, including but not limited to local and remote snapshots, snap clones, automatic data tiering, dynamic cluster rebalancing, backup, encryption, and quotas (advisory, soft, and hard).

Figure 1 – WekaFS combines NVMe flash with cloud object storage in a single global namespace

Key components to WEKA Data Platform in Azure include:

The infrastructure is deployed directly into a customer’s subscription of choice
WEKA software is deployed across 6 or more Azure LSv3 VMs. The LSv3 VMs are clustered to act as one single device.
The WekaFS™ namespace is extended onto Azure Hot Blob
WekaFS Scale Up and Scale down functions are driven by Azure Logic Apps and Function Apps
All client secrets are kept in Azure Key Vault
Deployment is fully automated using Terraform WEKA Templates

WEKA and Data Tiering

WEKA’s tiering capabilities in Azure integrates seamlessly with Azure Blob Storage. This integration leverages WEKA’s distributed parallel file system, WekaFS™, to extend from local NVMe SSDs on LSv3 VMs (performance tier) to lower cost Azure Blob Storage (capacity tier). WEKA writes incoming data in 4K blocks (commonly referred to as chunks) aligning to NVMe SSD block size, packaged into 1MB extents, and distributes the writes across multiple storage nodes in the cluster (in Azure, a storage node is represented as a LSv3 VM). WEKA then packages the 1MB extents into 64MB objects. Each object can contain data blocks from multiple files. Files smaller than 1 MB are consolidated into a single 64 MB object. For larger files, their parts are distributed across multiple objects.

Figure 2 – WekaFS Tiering to HOT BLOB

How do you retrieve data that is cold? What are the options?

Tiered data is always accessible and is treated as if it was part of the primary file system. Moreover, while data may be tiered, the metadata is always maintained on the SSDs. This allows traversing files and directories without impacting performance.

Consider a scenario where an HPC job has run and outputs are written to WekaFS. In time the outputs file data will be tiered to Azure Blob (capacity tier) to free up the WekaFS (performance tier) to run new jobs. At some later date the data is required again for processing. What are the options?

Cache Tier: When file data is tiered to Blob, the file metadata always remains locally on the flash tier, so all files are available to the applications. WEKA maintains the cache tier (stored in NVMe SSD) within its distributed file system architecture. When file data is rehydrated from Azure Blob Storage, WEKA stores the data in “read cache” for improved subsequent read performance.

Pre-Fetch: WEKA provides a pre-fetch API to instruct the WEKA system to fetch all of the data back from Blob (capacity tier) to NVMe (performance tier). For further details please refer to this link: https://docs.Weka.io/fs/tiering/pre-fetching-from-object-store

Cold read the data directly from Blob. The client will still access the data from the WEKA mount. The data will not be cached by WEKA FS and sent directly to the client

It is bullet #3 that is the had me intrigued. WEKA claims to parallelize reads, so would it be possible to read directly from Blob at a “WEKA Accelerated Rate”?

Testing Methodology:

The test design.

The testing infrastructure consisted of:
6 x Standard_D64_v5 Azure VMs used for clients
20 x L8s_v3 VM instances that were used for the NVME WEKA layer
Hot Zone Redundant Storage (ZRS) enabled Blob

For the test, a 2 TB file system was used on the NVME layer (for metadata) and 20 TB was configured on the HOT BLOB layer.

Figure 3 – WekaFS testing Design.

A 20 TB Filesystem was created on WEKA:

Figure 4 – Sizing the WekaFS

We choose an Object Store direct mount (see the option obs_direct).

pdsh mount -t wekafs -o net=eth1,obs_direct [weka backend IP]/archive /mnt/archive

To simulate load, we used to write random data to the object store in a 1M block size.

pdsh ‘fio –name=$HOSTNAME-fio –directory=/mnt/archive –numjobs=200 –size=500M –direct=1 –verify=0 –iodepth=1 –rw=write –bs=1M’

Once the write workload completes, notice that only 2.46 GB of data resides on the SSD tier (this is all metadata), and 631.6 GB resides on BLOB storage.

Figure 5 – SSD Tier used for Metadata only

Double checking the file system using the Weka fs command. The used SSD capacity remains at 2.46 GB which is the size of our metadata.

Figure 6 – SSD Tier used for Metadata only.

Now that all the data resides on BLOB, lets measure how quickly it can be accessed.

We’ll benchmark our performance with FIO. We’ll run load testing across all six of our clients. Each client will be reading in 1MB block sizes.

pdsh ‘fio –name=$HOSTNAME-fio –directory=/mnt/archive –numjobs=200 –size=500M –direct=1 –verify=0 –iodepth=1 –rw=read –bs=1M –time_based –runtime=90’’

The command is configured to run for 90 seconds so we can capture the sustained bandwidth from the hot blob tier of the WEKA data platform.

From the screenshot below (Figure 7), observe that we are reading data from Azure Blob at speeds up to 20 GB/s.

Figure 7 – 19.63 GB/s 100% reads coming directly from BLOB

How does WEKA do it?

Simple answer…even load distribution across all nodes in the cluster. Each WEKA compute process establishes 64 threads to run GET operations from the Blob container. Each WEKA backend is responsible for an equal portion of the namespace, and each will perform the appropriate API operation from the Azure Blob.

Thus, Multiple nodes working together to process 64 threads each equals a term I will call “WEKA Accelerated HOT BLOB Tier”

Looking at the stats on the command line while the test was running (Figure 8), you can observe the distribution of servicing the tiered data is fully balanced across all the WEKA nodes in the cluster. This balance helps WEKA achieve its optimal performance from Azure Blob.

Figure 8 – Balanced backend nodes with 64 threads each for GET operations from BLOB

What real world problems can we solve with this feature?
1 – When one needs to ingest large volumes of data at once into the WEKA Azure platform. If the end user does not know what files will be “hot”, they can have it all reside directly on BLOB storage so that it doesn’t force any currently active data out of the flash tier.

2 – Running workloads that need to sequentially read large volumes of data infrequently. For example, an HPC job where the data is only used once a month or once a quarter. If each compute node reads a different subset of the data, there is no value to be gained from rehydrating the data into the flash tier / displacing data that is used repeatedly.

3 – Running read-intensive workloads where weka accelerated BLOB cold read performance is satisfactory. Clients can mount the file system in obs direct mode.

Conclusion

WEKA in Azure delivers exceptional performance for data-intensive workloads by leveraging parallelism, scalability, flash optimization, data tiering, & caching features. This enables organizations to achieve high throughput, low latency, and optimal resource utilization for their most demanding applications and use cases.

You can also add low latency high throughput reads directly from Hot Blob Storage as another use case. To quote from Kent one last time:

…..As the digital landscape continues to evolve, embracing the WEKA Data Platform is not just a smart choice; it’s a strategic advantage that empowers you to harness the full potential of your HPC Grid.

Reference:

WekaFS on Microsoft Azure

Weka on Azure Terraform

Azure LSv3 Virtual Machines

Azure Blob Storage

Microsoft Tech Community – Latest Blogs –Read More