DEV Community

DASWU
DASWU

Posted on

JuiceFS 1.1: Easier Cloud Storage for Billions of Files

It has been 13 months since we released JuiceFS 1.0 in August 2022. Today, we are excited to announce the release of JuiceFS 1.1. This marks our second long-term support (LTS) version following version 1.0 and is fully compatible with it. In this release, we have significantly enhanced stability, usability, security, features, and performance to simplify the management of massive data volumes.

Currently, JuiceFS supports 10+ metadata engines and 30+ data storage engines, providing users with a wide range of options to address diverse enterprise environments and data storage needs. Additionally, JuiceFS is compatible with POSIX, HDFS, S3, and WebDAV access protocols and can be used as a persistent volume (PV) in Kubernetes. This ensures data can seamlessly flow between various applications.

New features at a glance: Easy management of massive data

Since its open-source release in 2021, JuiceFS has gained widespread adoption in scenarios requiring shared file storage, such as big data and machine learning. As users' data scales continue to grow, managing massive files, including clusters with 10 billion+ files, becomes a challenge.
To facilitate easier management of vast amounts of data, JuiceFS 1.1 introduces directory-based space usage statistics and the following new features:

  • Directory quotas: Set size and file count limits for directories to prevent individual users from consuming excessive resources, ensuring system stability.
  • Directory cloning: Quickly clone directories and their contents by only copying their metadata in situations that require copying numerous files, saving time and space.
  • Quick usage statistics overview: View storage space and file count statistics in the tree structure.
  • New metadata engine support: FoundationDB, a distributed database open-sourced by Apple, known for its high performance, scalability, and fault tolerance.
  • New data storage option: GlusterFS, simplifying self-built object storage scaling and maintenance. For details, see JuiceFS 1.1 Beta 2: Simplifying Large-Scale Cluster Management with Gluster.

Enhanced security:

  • Mitigating permission security risks and preventing accidental operations: Mapped the root user to a non-privileged user during mount using the --root-squash option.
  • Controlling file behavior with special flags: Enabled partial support for ioctl during mount using the --enable-ioctl option, including features like append-only (a) and immutable (i).
  • Preventing cache data errors due to hardware anomalies: Added integrity checks to local cache files.

Improved stability:

  • Addressing compatibility issues with integration components:
    • Added a dedicated garbage collection (GC) thread for TiKV to resolve the issue of automatically performing GC operations when TiDB components are not deployed.
    • Improved JuiceFS' compatibility when it is used as a gateway and within the Hadoop ecosystem.
  • Adjusting usage strategies for specific scenarios:
    • Fixed high CPU utilization issues on high-end machines with FUSE.
    • Refactored data object clearing control for better adjustment of pending object cleanup speed.
    • Introduced the cache-scan-internal option for customizing local cache scanning intervals, allowing scanning only at startup or complete disabling.
    • Added the cache -eviction option to adjust local cache cleanup strategies.
    • Introduced the skip-dir-nlink option to reduce metadata transaction conflicts caused by concurrent directory creation in the same directory.
  • Fixing bugs that could lead to client crashes:
    • Fixed the issue that anomalies of certain values within the metadata engine could trigger a client panic.
    • Resolved potential deadlocks when clients perform concurrent truncate and release operations.

Enhanced usability:

  • One-click diagnostic information collection: Added a feature for generating diagnostic reports to simplify issue troubleshooting and feedback.
  • One-click recovery of deleted files: Easily recover all deleted files from a specific time period without the need for individual recovery operations.
  • Data synchronization without mounting: Supported the jfs:// prefix for accessing data in JuiceFS when the sync tool is used.
  • Automatic startup addition: Adding the --update-fstab option during mount will automatically add the same mount parameters for system startup.
  • Improved performance of the info command for viewing file internal structures, providing more useful information.
  • Enhanced the fsck command to repair damaged directory information under certain conditions.
  • Improved the gc command for garbage collection, useful for manual cleanup when there is an accumulation of pending deletions.
  • Further improved the performance of the sync command and added multiple strategy parameters to accommodate different requirements.

For the full list of enhancements and fixes, see JuiceFS 1.1 Release Notes.

Battle-tested in more production environments: Growing community adoption

JuiceFS was officially open-sourced on GitHub in January 2021. It has since gained widespread attention and adoption globally, becoming one of the fastest-growing projects in the file storage field, with 8.5k stars on GitHub.
Compared to the release of JuiceFS 1.0 last year, anonymous user usage metrics have seen substantial growth.

Image description

JuiceFS was initially designed for cloud deployment for big data platforms. With the ongoing development of AI technology, JuiceFS has found increasing applications in AI scenarios, including autonomous driving, artificial intelligence-generated content (AIGC), and large language models. Currently, JuiceFS is used in production environments by companies or organizations such as PIESAT, Xiaomi, vivo, Baidu, Trip.com, DJI, Li Auto, SmartMore, SAIC Motor, Horizon Robotics, Unisound, DP Technology, Douban, Zhejiang Lab, SenseTime, Shopee, Zhihu, NetEase Games, and Yimian.

Active community collaboration: Building the cloud-native ecosystem

Over the past 13 months, JuiceFS Community Edition has remained highly active, with 410 new issues, 920 merged pull requests, and a total of 102 contributors—a 100% increase from the previous year.

JuiceFS Community Edition is licensed under Apache 2.0, allowing users to confidently apply JuiceFS in various commercial environments. This not only allows users to make necessary enhancements but also facilitates seamless integration with upstream and downstream applications, thereby enriching the cloud-native ecosystem. In this version, JuiceFS and Fluid have undergone significant optimizations related to data migration, directory quotas, and JuiceFSRuntime.

In upcoming versions, we’re excited to introduce the following highly anticipated features, and we invite you to join us in their development:

  • Distributed data caching
  • Support for Kerberos and Ranger
  • Smooth mount point upgrades
  • POSIX ACLs
  • User and group quotas

Give it a try!

If you’re interested, you can download JuiceFS 1.1 and try it out. For upgrade instructions, check out this document. If you have any questions or would like to share your feedback, feel free to join our discussions on GitHub and community on Slack.

In the two years since JuiceFS went open source, it has been embraced by numerous enterprises worldwide. We extend our heartfelt gratitude to each community member for their valuable contributions, issue reports, responses to queries, code contributions, and shared practical experiences. All of these efforts have collectively strengthened JuiceFS and made it even more user-friendly. Thank you!

Top comments (0)