DEV Community

Setting Up Proxies in Python for Web Scraping

In web scraping, proxies can be your secret weapon. They help you bypass rate limits, dodge bot protections, and keep your online footprint under wraps. But getting proxies to work seamlessly with Python's Requests library? That’s where it can get tricky. This guide walks you through the ins and outs of using proxies with Requests, so you can scrape smarter, not harder.

Why Proxies Are Important for Scraping

Web scraping without proxies is like playing a game without the right gear. Anti-bot protections like rate limits and CAPTCHA systems can trip you up quickly. But proxies give you an edge—they help you fly under the radar, scrape more data, and keep your IPs rotating to avoid detection. They’re indispensable if you want to scale your scraping projects or work with sensitive data.

Basic Syntax for Requests with Proxies

Before we dive deep, let’s cover the basic setup. To use proxies with Requests, you need a simple proxies dictionary to direct your traffic. Here's how to get started:

import requests

http_proxy = "http://130.61.171.71:3128"
proxies = {
    "http": http_proxy,
    "https": http_proxy,
}

resp = requests.get("https://ifconfig.me/ip", proxies=proxies)
print(resp, resp.text)
Enter fullscreen mode Exit fullscreen mode

The result? You'll see the IP address of the proxy, not your own:

<Response [200]>130.61.171.71
Enter fullscreen mode Exit fullscreen mode

The proxy you choose matters. Free proxies might work for a while, but they can disappear fast. Always look for reliable, paid proxies for consistent results.

Understanding the Proxies Dictionary

You might look at the proxy dictionary above and wonder, Why use HTTP for both HTTP and HTTPS? Well, here's the deal. Requests allows you to map protocols like HTTP, HTTPS, and FTP to their respective proxy URLs. You don’t have to overcomplicate it.
The syntax is:

proxies = {
  "target_protocol": "scheme://proxy_host:proxy_port"
}
Enter fullscreen mode Exit fullscreen mode
  • target_protocol: This is the protocol for which you’re specifying the proxy (e.g., HTTP, HTTPS).
  • scheme: This defines the type of connection to your proxy (usually HTTP or HTTPS).
  • proxy_host: The domain or IP of your proxy.
  • proxy_port: The port your proxy is using.

Various Types of Proxy Connections

Not all proxies are created equal. You’ve got different options depending on your needs.

  • HTTP Proxy: The fastest option for non-encrypted traffic. Great for high-volume scraping, but no encryption means less security.
  • HTTPS Proxy: Offers encryption, securing your connection, but it can be a bit slower. It’s essential for any site that uses HTTPS.
  • SOCKS5 Proxy: Flexible and secure, handling multiple protocols. Perfect for routing traffic to non-HTTP services (like Tor). You’ll need an extra library to use SOCKS5 proxies with Requests:
python3 -m pip install requests[socks]
Enter fullscreen mode Exit fullscreen mode

Then, it’s as simple as:

import requests

username = "myusername"
password = "mypassword"

socks5_proxy = f"socks5://{username}:{password}@proxyhost:1080"
proxies = {
  "http": socks5_proxy,
  "https": socks5_proxy,
}

resp = requests.get("https://ifconfig.me", proxies=proxies)
print(resp, resp.text)
Enter fullscreen mode Exit fullscreen mode

How to Use Proxy Authentication

In the real world, free proxies just won’t cut it. Paid proxies often require authentication. Here’s how to add your credentials to the proxy setup:

username = "myusername"
password = "mypassword"

proxies = {
  "http": f"http://{username}:{password}@proxyhost:1080",
  "https": f"https://{username}:{password}@proxyhost:443"
}
Enter fullscreen mode Exit fullscreen mode

Easy, right?

Using Environment Variables for Proxies

Sometimes, hardcoding your proxy into your script isn’t ideal. You can set up environment variables to manage proxy configurations:

$ export HTTP_PROXY='http://myusername:mypassword@proxyhost:1080'
$ export HTTPS_PROXY='https://myusername:mypassword@proxyhost:443'
Enter fullscreen mode Exit fullscreen mode

Then, in Python, simply:

import requests
resp = requests.get("https://ifconfig.me/ip")
print(resp.text)
Enter fullscreen mode Exit fullscreen mode

This keeps your proxy settings neatly separated from your code.

Using Session Objects to Save Time and Effort

The requests.Session object is a game changer. It allows you to set default parameters (like proxies) and reuse them across requests. This is especially useful if you're working with sites that require cookies or consistent proxy usage.

import requests

session = requests.Session()
session.proxies.update(proxies)

resp = session.get("https://ifconfig.me/ip")
print(resp.text)
Enter fullscreen mode Exit fullscreen mode

Rotate Proxies for a Stealthier Approach

If you're scraping on a large scale, rotating proxies is a must. This allows you to dodge rate limits and avoid IP bans. You can rotate proxies from a list or, if you’re using a proxy service, they can rotate automatically for you.
Here’s an example of rotating proxies from a list:

import random
import requests

proxies_list = [
    "http://proxy1:8080",
    "http://proxy2:80",
    "http://proxy3:3128",
]

for _ in range(10):
    proxies = {"https": random.choice(proxies_list)}
    resp = requests.get("https://ifconfig.me/ip", proxies=proxies)
    print(resp.text)
Enter fullscreen mode Exit fullscreen mode

For a seamless rotation experience, some proxy providers can rotate proxies for each request automatically. You don’t have to do the heavy lifting.

The Differences Between Sticky Proxies and Rotating Proxies

Here’s where things get really cool. Sticky proxies allow you to keep the same IP for a session, making it ideal for tasks like login scrapers or data collection that requires user authentication. Rotating proxies change IPs regularly, making them great for bypassing anti-bot systems.
Sticky proxy example:

import requests
from uuid import uuid4

def sticky_proxies_demo():
    sessions = [uuid4().hex[:6] for _ in range(2)]

    for i in range(10):
        session = sessions[i % len(sessions)]
        http_proxy = f"http://{username},session_{session}:{password}@proxyhost:1080"
        proxies = {
            "http": http_proxy,
            "https": http_proxy,
        }
        resp = requests.get("https://ifconfig.me/ip", proxies=proxies)
        print(f"Session {session}: {resp.text}")
Enter fullscreen mode Exit fullscreen mode

Handling Proxy Errors and SSL Issues in Web Scraping

Even the best proxies can run into issues. Common errors like ProxyError, TimeoutError, or SSLError are frequent in web scraping, especially with residential proxies. One way to tackle these is by rotating proxies or using the retry feature in Requests.
For SSL errors, you can disable warnings with:

import requests
import urllib3

urllib3.disable_warnings()

resp = requests.get("https://ifconfig.me/ip", proxies=proxies, verify=False)
print(resp.text)
Enter fullscreen mode Exit fullscreen mode

Final Thoughts

Using proxies with Python’s Requests library can take your scraping projects to the next level. Whether you're bypassing bot protections, rotating IPs, or handling proxy authentication, mastering these tools gives you the flexibility and control you need to succeed.

Top comments (0)