DEV Community

DASWU
DASWU

Posted on

How JuiceFS Achieves Consistency and Low-Latency Data Distribution in Multi-Cloud Architectures

With the growing popularity of large language models (LLMs), GPU computing power has become a scarce resource. The GPU resources of a single data center or cloud region often cannot meet users’ diverse needs. Moreover, cross-regional collaboration demands have driven enterprises to distribute data and computing tasks across different cloud platforms. Multi-cloud architecture is becoming a trend, but data distribution in such architecture faces numerous challenges.

This article describes how JuiceFS Enterprise Edition tackles the challenges of data distribution and consistency in multi-cloud architectures, particularly for AI workloads. It explores solutions for cross-cloud data synchronization, low-latency access, and cost optimization.

Storage challenges in multi-cloud architectures

In practical multi-cloud architectures, many enterprises deploy compute Pods on various cloud platforms to handle specific computational tasks and distribute data. However, ensuring the continuity of this distribution process and timely merging of training results becomes a pressing challenge. This is especially true when data transmission across regions is involved, as performance bottlenecks and data consistency issues become prominent.

The diagram shows a training cluster on Cloud A (left) and an inference cluster on Cloud B (right). How can the model data generated by the training cluster be efficiently distributed to the inference cluster?

Image description

Challenges include:

  • The primary challenge lies in the complexity of data distribution and remote computing. Users must manually copy data from the file system to the remote location and set periodic and quantitative distribution strategies.
  • When dealing with massive data volumes, comprehensive synchronization consumes substantial resources. However, hot data often constitutes only a small portion, and enterprises typically struggle to predict which data is hot until after it has been read. Consequently, copying data on demand rather than comprehensively, while establishing local caching at remote locations, can improve performance while balancing costs.
  • Issues such as network bandwidth and retry mechanisms during errors can lead to data inconsistency. Cloud providers also tend to build closed ecosystems and are reluctant to offer cross-cloud functionality. For example, Cloud A does not provide tools to help users copy data to other clouds for distribution. Therefore, as a cloud-neutral third-party file system, it’s crucial to deliver a solution to break free from the constraints of individual cloud vendors and meet customers' cross-cloud and multi-cloud requirements.

JuiceFS cross-cloud and multi-cloud solutions

JuiceFS Enterprise Edition is a distributed file system based on object storage, offering a more robust metadata engine and cache management capabilities compared to the Community Edition. To address different data access performance requirements in multi-cloud architectures, JuiceFS provides various solutions for cross-region and cross-geography scenarios.

Image description

Solution 1: Cross-cloud data distribution within the same region

This solution refers to data distribution between different clouds within the same region, commonly applied in active-active data setups and disaster recovery scenarios. By establishing an asynchronous data synchronization relationship between the source region (top left in the figure) and the target region, the system automatically replicates data between regions while ensuring data consistency.

Image description

This approach uses a shared metadata service, allowing clients in different regions to write data locally when mounting the file system. This optimizes data access efficiency. Asynchronous replication and metadata consistency ensure data stability and integrity across regions.

Regarding data consistency, JuiceFS guarantees strong consistency through metadata. When files are modified, new data blocks are appended to the object storage, and metadata is updated to point to these blocks. As long as metadata remains consistent, the entire file's consistency is ensured. Therefore, there is no data inconsistency when the target client accesses the same metadata service. If data is synchronized to the target region's storage bucket, it’s directly read from there; otherwise, it’s read from the source storage bucket to maintain completeness and consistency.

This solution is widely applicable to active-active and disaster recovery scenarios. For active-active data applications, enterprises can share data across multiple regions through cross-region replication, achieving high availability and load balancing.

In disaster recovery, this approach asynchronously backs up data to the target region, preventing data unavailability due to issues like account suspension or access restrictions on the source cloud platform. Even if the source region encounters failures, clients only need to remount and switch to the target region to resume normal operations seamlessly.

Solution 2: Cross-region data access for large-scale AI training scenarios

To address performance challenges in cross-region data access, multiple solutions are available for different scenarios.

Metadata and data synchronization

When metadata services are shared between two regions, data access usually does not suffer significant latency. However, when data transmission spans continents, such as accessing nodes in Singapore or London, performance issues can become prominent, particularly with numerous small files. Remote data access without cache hits may require reading from the source region, significantly impacting performance.

To solve this, we designed a mirror file system feature. By synchronizing data and metadata between the source region and the target (mirror) region, the system ensures data consistency across regions, enabling low-latency access. While real-time synchronization is ideal, geographical network constraints make this practically unattainable.

Image description

In the mirror file system, operations proceed as follows: when clients in the source region perform read or write operations on the source storage bucket, data is asynchronously written to the mirror region’s storage bucket. When the client in the mirror region is performing training or inference, the system reads data from the nearby mirror region to reduce access latency and improve performance. If data has not yet been synchronized, the system reads from the source region.

Notably, prior to JuiceFS version 5.0, mirror regions supported only read operations. Version 5.0 and later introduced write support. When writing data, the system first writes it to the mirror region bucket, updates metadata in the source region's metadata service (note that metadata is not written directly to the mirror region), and synchronizes metadata to the mirror region.

The mirror write process seems a bit complex, but to ensure that synchronization doesn't encounter errors under various network fluctuations, we adopted a one-way synchronization design. Although there is some delay during the write process, this is a necessary compromise made for consistency.

Metadata-only synchronization

To address the challenges of cost and time when synchronizing large volumes of data, we provide an on-demand synchronization solution. You can choose to synchronize only metadata. The key difference from the previous method is the avoidance of full bucket replication. While this lacks some local data proximity, performance can still be guaranteed if the distributed cache has a high hit rate. The most crucial benefit is the significant reduction in time, replication traffic, and storage costs. When data is written, the system writes the data back to the source storage bucket and metadata region, and synchronizes it to the mirror region. This approach enables secure and efficient cross-region data synchronization and access.

Image description

Case study: Cross-cloud metadata synchronization for an LLM enterprise

This enterprise has a large amount of idle GPU resources in Cloud B and wants to collaborate with Cloud A's training tasks by using these resources. Therefore, the enterprise needs to distribute data from Cloud A to Cloud B. However, due to frequent network fluctuations between the two clouds, data distribution was hindered. This negatively impacted training efficiency, particularly when handling large numbers of small files, causing noticeable freezing.

To resolve this issue, the enterprise chose to use JuiceFS for cross-region data distribution. By synchronizing only the metadata, JuiceFS effectively reduced the impact of network fluctuations. At the same time, to control costs, the enterprise did not synchronize all the data to Cloud B, but instead used a distributed cache to warm up and synchronize only the necessary data. This approach ensured performance while optimizing costs.

Currently, the mirror file system has a total size of approximately 256 TB, containing 13 million files, with an average file size of 18.6 MB. In the mirror region on Cloud B, there are 540 clients, with numerous training containers supporting tasks from Cloud A. The metadata queries per second (QPS) on Cloud B has reached 58,100. Despite having only 138 distributed cache nodes with a total cache capacity of 8.8 TB, the system achieves significant aggregated bandwidth due to the sufficient number of nodes and network interfaces. The maximum cache read throughput is 37 GB/s, and the write throughput is 4 GB/s. This meets the performance requirements of the scenario.

Image description

Summary

To address diverse user requirements for data access performance in multi-cloud architectures, JuiceFS offers a comprehensive, third-party, cloud-neutral solution that avoids being tied to any specific cloud platform.

The table below compares two JuiceFS solutions for multi-cloud data distribution:

Image description

Solution 1: Same-region, cross-cloud data distribution

This solution is suited for scenarios where two locations are close and data communication is stable. It’s also applicable for cross-cloud bucket disaster recovery.

Solution 2: Cross-region data access

  • Data and metadata synchronization: It supports multiple data copies, particularly for scenarios where the regions are geographically distant. The main advantage is the best read performance, as both metadata and data can be accessed locally. However, this comes at the cost of the highest expenses.

  • Metadata-only synchronization: This is a balanced approach to cost and performance. Metadata is accessed locally, while data is retrieved on demand from the cache. If the cache misses, the system falls back to the source. However, write performance is moderate, as both metadata and data require fallback writes. This solution is particularly suitable for cost-sensitive scenarios where most data in the mirror region is read-only.

If you have any questions for this article, feel free to join JuiceFS discussions on GitHub and community on Slack.

Top comments (0)