DEV Community

Dan Benitah
Dan Benitah

Posted on

Git clone - that repo is too big : HELP!

Working with large repositories can be challenging, especially when you only need a specific directory. Instead of wasting time and storage, here's a guide to efficiently clone sub-directories using git-sparse-checkout.

1. Clone the Repository with Sparse Checkout

Use the --depth 1 flag to clone only the latest commit, and --filter=blob:none to avoid downloading file contents initially:

git clone --depth 1 --filter=blob:none https://github.com/danuw/azure-docs.git --sparse
Enter fullscreen mode Exit fullscreen mode

Resulted in following (note the size of 307 kiB downloaded when the repo is gigabytes heavy!!)

Cloning into 'azure-docs'...
remote: Enumerating objects: 8438, done.
remote: Counting objects: 100% (8438/8438), done.
remote: Compressing objects: 100% (7673/7673), done.
remote: Total 8438 (delta 51), reused 4682 (delta 25), pack-reused 0 (from 0)
Receiving objects: 100% (8438/8438), 2.56 MiB | 16.60 MiB/s, done.
Resolving deltas: 100% (51/51), done.
remote: Enumerating objects: 85, done.
remote: Counting objects: 100% (85/85), done.
remote: Compressing objects: 100% (81/81), done.
remote: Total 85 (delta 52), reused 16 (delta 4), pack-reused 0 (from 0)
Receiving objects: 100% (85/85), 307.16 KiB | 5.04 MiB/s, done.
Resolving deltas: 100% (52/52), done.
Enter fullscreen mode Exit fullscreen mode

2. Navigate to the Cloned Repository

Change your directory to the cloned repository:

cd azure-docs
Enter fullscreen mode Exit fullscreen mode

3. Initialize Sparse Checkout Mode

Enable sparse checkout in cone mode, which simplifies the process of selecting specific directories:

git sparse-checkout init --cone
Enter fullscreen mode Exit fullscreen mode

4. Set the Directory to be Checked Out

Specify the directory you want to check out from the repository. In this example, we are checking out the articles/iot-operations directory:

git sparse-checkout set articles/iot-operations
Enter fullscreen mode Exit fullscreen mode

Note: If you encounter an error, ensure that your Git version supports sparse checkout, that the specified directory exists in the repository or that the syntax is adapted to your local system (Windows, MacOS, Linux?...)

After these steps, only the files within the articles/iot-operations directory will be checked out into your local repository, minimizing the data you download and store.

remote: Enumerating objects: 189, done.
remote: Counting objects: 100% (189/189), done.
remote: Compressing objects: 100% (182/182), done.
remote: Total 189 (delta 10), reused 106 (delta 7), pack-reused 0 (from 0)
Receiving objects: 100% (189/189), 10.96 MiB | 21.30 MiB/s, done.
Resolving deltas: 100% (10/10), done.
Updating files: 100% (190/190), done.
Enter fullscreen mode Exit fullscreen mode

Wrap up

This method is particularly useful when storage space is limited or when working with a slow connection. Whether you're cloning a specific lab version or just a part of a broader documentation set (e.g., for LLMs or RAG purposes), this guide offers a simple and effective solution.

Additional Notes

Top comments (0)