DEV Community

Scrapfly for Scrapfly

Posted on • Originally published at scrapfly.io on

How to Use cURL to Download Files?

How to Use cURL to Download Files

Curl, short for "Client URL," is a versatile command-line tool used for transferring data with URLs. It's widely favored by developers and system administrators for its ability to interact with a multitude of protocols such as HTTP, HTTPS, FTP, and more.

Using curl to download files simplifies the process by enabling direct command-line interaction with web resources. Curl is not only efficient and lightweight — operating without the need for a graphical interface — but also cross-platform, working seamlessly on Linux, macOS, and Windows systems.

In this article, we'll explore how to use curl to download a file from the web, covering various use cases and demonstrating the tool's versatility.

Why Use Curl to Download Files?

Curl stands out as an exceptional file downloading tool, offering a robust set of features that make it indispensable for developers. Here's what makes curl particularly powerful for downloading files:

Multi-Protocol Support

  • Handles various protocols like HTTP, HTTPS, FTP, and SFTP.
  • Eliminates the need for multiple tools when working with different protocols.

Resume Interrupted Downloads

  • Use the -C - option to continue downloads from where they left off.
  • Saves time and bandwidth by avoiding the need to restart downloads.

Bandwidth Management

  • Limit download speeds using --limit-rate to manage bandwidth usage.
  • Prevents downloads from consuming all available network resources.

Proxy Support

  • Easily configure proxies using options like -x or --proxy.
  • Supports various proxy types, including HTTP, HTTPS, SOCKS4, and SOCKS5.

Authentication Handling

  • Supports a range of authentication methods, including Basic, Digest, NTLM, and OAuth.
  • Access protected resources seamlessly.

Secure Transfers

  • Supports SSL/TLS protocols for secure file transfers.
  • Verify SSL certificates and use secure authentication methods.

Cross-Platform Compatibility

  • Available on Linux, macOS, Windows, and more.
  • Consistent functionality across different operating systems.

Automation and Scripting

  • Easily integrates into scripts for automated tasks.
  • Ideal for scheduled downloads using cron jobs or Windows Task Scheduler.

Curl's robust feature set makes it an excellent choice for downloading files, whether you're handling simple tasks or complex download operations. Its flexibility and efficiency empower users to manage downloads effectively in various environments.

You can learn more about curl and its options in our article about using curl for web-scraping

Now let's explore the basic usage of curl for downloading files and then dive deeper into more complex and unconventional scenarios.

Curl Basic File Download Options

By default, when curl is run on a file URL without any extra options, the file content is displayed is the terminal.

curl https://web-scraping.dev/assets/pdf/tos.pdf
Enter fullscreen mode Exit fullscreen mode

However, you can use curl to save to file with its original name using the -O (uppercase "O" for Output) option:

curl -O https://web-scraping.dev/assets/pdf/tos.pdf
Enter fullscreen mode Exit fullscreen mode

This command saves the file as tos.pdf, retaining the original filename.

Custom File Name on Download

To save the downloaded file with a custom name, use the -o (lowercase "o") option followed by the desired filename:

curl -o [filename] [URL]
Enter fullscreen mode Exit fullscreen mode

Example:

curl -o web-scraping-tos.pdf https://web-scraping.dev/assets/pdf/tos.pdf
Enter fullscreen mode Exit fullscreen mode

This command downloads tos.pdf and saves it as web-scraping-tos.pdf on your local machine.

Show Progress Bar / Download Silently

Curl show a progress meter by default. However, you can suppress the progress meter and show a simple progress bar instead.

Show Progress Bar

Replace the default progress meter with a simple progress bar using --progress-bar:

curl -O --progress-bar https://web-scraping.dev/assets/pdf/tos.pdf
Enter fullscreen mode Exit fullscreen mode

Download Silently

To suppress all output, including progress and error messages, use the -s or --silent option:

curl -O -s https://web-scraping.dev/assets/pdf/tos.pdf
Enter fullscreen mode Exit fullscreen mode

Silent Mode with Error Messages

If you want to hide the progress meter but still see error messages, combine -s with -S:

curl -O -s -S https://web-scraping.dev/assets/pdf/tos.pdf
Enter fullscreen mode Exit fullscreen mode

Retry for Unstable Connections

For unreliable network connections, you can configure curl to retry downloads automatically:

Set Number of Retries

Use the --retry option followed by the number of retry attempts:

curl -O --retry [number] [URL]
Enter fullscreen mode Exit fullscreen mode

Example:

curl -O --retry 5 https://web-scraping.dev/assets/pdf/tos.pdf
Enter fullscreen mode Exit fullscreen mode

This command retries the download up to 5 times upon failure.

Specify Retry Delay

To add a delay between retries, use --retry-delay:

curl -O --retry 5 --retry-delay [seconds] [URL]
Enter fullscreen mode Exit fullscreen mode

Example:

curl -O --retry 5 --retry-delay 10 https://web-scraping.dev/assets/pdf/tos.pdf
Enter fullscreen mode Exit fullscreen mode

This adds a 10-second pause between each retry attempt.

Retry on All Errors

By default, curl retries on transient errors. To make it retry on all errors, use --retry-all-errors:

curl -O --retry 5 --retry-all-errors [URL]
Enter fullscreen mode Exit fullscreen mode

Example:

curl -O --retry 5 --retry-all-errors https://web-scraping.dev/assets/pdf/tos.pdf
Enter fullscreen mode Exit fullscreen mode

Handling Large File Downloads

Downloading large files can pose challenges such as network congestion or impacting other users on the same network. Curl offers options to manage these issues effectively.

To prevent a large download from consuming all your available bandwidth, you can limit the download speed using the --limit-rate option:

curl -O --limit-rate [speed] [URL]
Enter fullscreen mode Exit fullscreen mode

Example:

curl -O --limit-rate 500k https://web-scraping.dev/assets/pdf/tos.pdf
Enter fullscreen mode Exit fullscreen mode

This command limits the download speed to 500 kilobytes per second. You can specify the speed using suffixes:

  • k or K for kilobytes (e.g., 500k)
  • m or M for megabytes (e.g., 2M)

Benefits:

  • Bandwidth Management : Ensures other network activities aren't slowed down.
  • Network Stability : Reduces the risk of connection drops due to high bandwidth usage.

Insecure Downloading

In some cases, you might need to use cURL to download a file from a server with an invalid or self-signed SSL certificate. Curl verifies SSL certificates by default, which can block these downloads.

Disable SSL Certificate Verification

Warning: Disabling SSL verification can expose you to security risks like man-in-the-middle attacks. Use this option only when you're certain about the server's trustworthiness.

To bypass SSL certificate checks, use the -k or --insecure option:

curl -O -k https://web-scraping.dev/assets/pdf/tos.pdf
Enter fullscreen mode Exit fullscreen mode

This command tells curl to ignore SSL certificate validation and proceed with the download.

Verifying File Integrity

Ensuring that a downloaded file hasn't been tampered with is crucial, especially for important or large files. You can verify file integrity using checksum tools like sha256sum.

Using sha256sum to Verify Downloads

Steps:

  1. Download the File and Its Checksum
curl -O https://example.com/file.zip
curl -O https://example.com/file.zip.sha256
Enter fullscreen mode Exit fullscreen mode
  1. Verify the Checksum
sha256sum -c file.zip.sha256
Enter fullscreen mode Exit fullscreen mode
  • The -c option tells sha256sum to check the file against the provided checksum.

Manual Verification:

If the checksum isn't provided in a file:

  1. Get the Expected Checksum
  • Obtain the checksum value from the website or provider.
  1. Calculate the Downloaded File's Checksum
sha256sum file.zip
Enter fullscreen mode Exit fullscreen mode
  • This command outputs a checksum that you can compare with the expected value.

Example Output:

e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 file.zip
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • Security : Confirms the file hasn't been altered maliciously.
  • Data Integrity : Ensures the file isn't corrupted due to network issues.

Handling Authentication

When downloading files from protected resources, authentication is often required. Curl supports various authentication methods to access these resources.

Authorization Header

To include an authorization token or API key in your request, use the -H option to add a custom header:

curl -O -H "Authorization: Bearer your_token_here" https://api.example.com/securefile.zip
Enter fullscreen mode Exit fullscreen mode

This example uses bearer token authentication, but you can use any other authentication method supported by curl.

Cookie Session

If authentication relies on session cookies, you can manage cookies using curl:

When logging in, save the session cookies to a file using the -c option:

curl -c cookies.txt -d "username=user&password=pass" https://example.com/login
Enter fullscreen mode Exit fullscreen mode
  • The -d option sends POST data for login credentials.

  • Cookies received during login are saved to cookies.txt.

  • Use Saved Cookies

Use the saved cookies for subsequent requests with the -b option:

curl -O -b cookies.txt https://example.com/securefile.zip
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • Session Management : Maintains login sessions across multiple requests.
  • Automated Workflows : Scripts can handle login and file download processes seamlessly.

Utilizing these options enhances the reliability of your file downloads, ensuring efficiency, security, and smoother operations even with unstable internet connections.

Curl Command Builder

To simplify the process of creating cURL commands for file downloads, we've created a curl command builder tool. This interactive form allows you to select various options and generate the corresponding curl command instantly:

Check it out here
Image description

Automating Curl Downloads with Crontab

Automating file downloads ensures you always have the latest data without manual effort. By integrating curl with crontab, you can schedule downloads to run at specified times, enhancing efficiency and productivity.

What Is Crontab?

Crontab is a time-based job scheduler in Unix-like operating systems. It allows users to schedule scripts or commands to run automatically at predefined times or intervals.

Steps to Automate Downloads Using Crontab

1. Create a Download Script (Optional)

Write the Script

Create a shell script (e.g., download.sh) that contains your curl command:

#!/bin/bash
# Navigate to the desired directory
cd /path/to/download/directory

# Download the file using curl
curl -O https://example.com/file.zip
Enter fullscreen mode Exit fullscreen mode

Make the Script Executable

chmod +x /path/to/download.sh
Enter fullscreen mode Exit fullscreen mode

2. Edit the Crontab File

Open Crontab Editor

crontab -e
Enter fullscreen mode Exit fullscreen mode

Add a New Cron Job

Insert a line following the cron syntax:

* * * * * /path/to/command
Enter fullscreen mode Exit fullscreen mode

Example: Schedule the Script to Run Daily at 2 AM

0 2 * * * /path/to/download.sh
Enter fullscreen mode Exit fullscreen mode

Fields Explained:

  • Minute: 0
  • Hour: 2 (2 AM)
  • Day of Month: * (Every day)
  • Month: * (Every month)
  • Day of Week: * (Every day of the week)

3. Save and Exit

After adding your cron job, save the file. The cron service will automatically pick up the new schedule.

Automating curl downloads with crontab streamlines your workflow, ensuring timely and consistent data retrieval. Whether you're updating datasets, synchronizing files, or performing regular backups, this combination offers a robust solution for scheduled tasks.

Bypassing Download Blocks

When attempting to use curl to download files, you might encounter situations where the download is blocked or fails. This can be due to various reasons such as network restrictions, server configurations, or security measures that prevent automated requests.

The most common reason for download blocks is that the server is blocking automated requests. To bypass this, you can add a custom browser user-agent string to your request headers to mimic a real browser request.

curl -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" https://example.com/file.zip
Enter fullscreen mode Exit fullscreen mode

This example uses the -A option to set a custom user-agent string. You can replace the string with any other user-agent string that mimics a real browser request.

Changing the user-agent string is the most basic method to bypass download blocks. However, some servers are sophisticated enough to still block requests with custom user-agent strings. In these cases, you may need to use a more advanced tools like curl-impersonate.

Curl-impersonate is a modified version of cURL that simulates the TLS fingerprint of major web browsers, like Chrome, Firefox, Edge and Safari, by mimicing their TLS and HTTP2 configuration. It also overrides the default cURL headers, such as the User-Agent, with regular header values. This makes the cURL Impersonate requests look like those sent from the browsers, preventing the firewalls from detecting the usage of HTTP clients.

You can learn more about curl-impersonate in our dedicated guide on using curl-impersonate for web-scraping

Power Up File Downloads with Scrapfly

Downloading files programmatically can quickly become a cumbersome task. Especially when the files are protected against automation and bots using sophisticated bot protection systems that cannot be bypassed with tools like curl-impersonate.

Scrapfly has millions of proxies and connection fingerprints that can be used to bypass protection against automated traffic and significantly simplify your file download process.

How to Use cURL to Download Files

ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.

For example, here is how to use Scrapfly's web scraping API to download a file, we will use Scrapfly's Pyhton SDK to call the API:

from scrapfly import ScrapflyClient, ScrapeConfig
import base64

scrapfly = ScrapflyClient(key="YOUR SCRAPFLY KEY")

FILE_URL = "https://web-scraping.dev/assets/pdf/tos.pdf"

response = scrapfly.scrape(
    ScrapeConfig(
        url=FILE_URL,
        asp=True,
    )
)

## decode base64 file data
file_data = base64.b64decode(response.result.content)

with open("tos.pdf", "wb") as f:
    f.write(file_data)
Enter fullscreen mode Exit fullscreen mode

Scrapfly's API automatically detects that the requested URL is a file and return the binary content of the file encoded with base64. Which is why we decoded the content returned by the API before we saved it to a file called tos.pdf.

FAQ

Wrapping up, here are some common questions concering downlaoding files with curl:

Can I resume an interrupted download with curl?

Yes, you can resume an interrupted download by using the appropriate option in curl that allows you to continue from where the download stopped, which is especially useful for large files or unstable connections.

Is wget a better alternative to curl for downloading files?

wget is another command-line tool specifically designed for downloading files. While curl is versatile and supports various protocols and features, wget is often preferred for its simplicity in handling recursive downloads and its ability to download entire websites. You can learn more about the differenced between curl and wget in our dedicated curl vs wget article

How do I download multiple files at once using curl?

You can download multiple files simultaneously by specifying multiple URLs in a single command or by using scripting methods to loop through a list of URLs, allowing for efficient batch downloads.

Summary

Curl is a versatile tool when it comes to downloading files, offering:

  • Multi-Protocol Support : Works with HTTP, HTTPS, FTP, and more.
  • Resume Capability : Restarts interrupted downloads with ease.
  • Proxy and Bandwidth Management : Supports proxies and limits download speed.
  • Authentication Support : Handles cookies, tokens, and secured resources.
  • Automation : Integrates with scripts and scheduling tools like crontab.

For advanced needs, tools like curl-impersonate or services like Scrapfly can bypass sophisticated bot protections, offering:

  • Enhanced Bypass Capabilities : Overcomes anti-bot systems.
  • API Flexibility : Simplifies complex file downloads with robust solutions.

Curl’s feature-set make it essential for managing simple to complex downloads efficiently.

Top comments (0)