DEV Community

Cover image for Golang with Colly: Use Random Fake User-Agents When Scraping
Ivan Filippov
Ivan Filippov

Posted on

Golang with Colly: Use Random Fake User-Agents When Scraping

One of the primary reasons for getting blocked while performing web scraping is the use of improper or default user-agents.

Fortunately, adding random fake user-agents to your Go Colly scrapers is straightforward and simple.

What Are Fake User-Agents?

User-agents are strings used by websites to identify the client making the request, providing information about the application, operating system (e.g., Windows, macOS, Linux), and browser (e.g., Chrome, Firefox, Safari) being used. These strings are sent to servers as part of the HTTP request headers.

For instance, here’s an example of a user-agent when accessing a website using Chrome on an Android device:

'User-Agent': 'Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Mobile Safari/537.36'

Enter fullscreen mode Exit fullscreen mode

When scraping a website, it’s crucial to set a user-agent for each request. If you don’t, the website can detect non-human traffic and block your scraper.

By default, Go Colly uses this user-agent for its requests:

"User-Agent": "colly - https://github.com/gocolly/colly",
Enter fullscreen mode Exit fullscreen mode

This default user-agent reveals that Colly is being used, making your scraper easy to detect and block. To avoid this, customizing the user-agent is essential.

That is why we need to manage the user-agents Go Colly sends with our requests.

How To Set A Fake User-Agent In Go Colly

Implementing a fake user-agent with Go Colly is a breeze. You can modify the request headers by setting a custom user-agent in the OnRequest() callback, ensuring each outgoing request uses a different or randomized string.


package main

import (
    "bytes"
    "log"
    "github.com/gocolly/colly"
)

func main() {
    // Instantiate default collector
    c := colly.NewCollector(colly.AllowURLRevisit())

    // Set Fake User Agent
    c.OnRequest(func(r *colly.Request) {
        r.Headers.Set("User-Agent", "1 Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148")
    })

    // Print the Response
    c.OnResponse(func(r *colly.Response) {
        log.Printf("%s\n", bytes.Replace(r.Body, []byte("\n"), nil, -1))
    })

    // Fetch httpbin.org/headers five times
    for i := 0; i < 5; i++ {
        c.Visit("http://httpbin.org/headers")
    }
}


Enter fullscreen mode Exit fullscreen mode

From here our scraper will use this user-agent for every request.

However, if you are scraping at scale then using the same user-agent for every request isn't best practice as it makes it easier for the website to detect you as a scraper.

To solve this problem we will need to configure our Go Colly scraper to use a random user-agent with every request.

How To Rotate Through Random User-Agents

For using the generated user agent, it is enough to use special packages, in the following examples we will use github.com/lib4u/fake-useragent

package main

import (
    "bytes"
    "log"
    "github.com/gocolly/colly"
    uaFake "github.com/lib4u/fake-useragent"
)

func main() {

    // Init user-agent faker
    ua, err := uaFake.New()
    if err != nil {
        fmt.Println(err)
    }
    // Instantiate default collector
    c := colly.NewCollector(colly.AllowURLRevisit())

    // Set Fake User Agent
    c.OnRequest(func(r *colly.Request) {
        r.Headers.Set("User-Agent", ua.Filter().GetRandom())
    })

    // Print the Response
    c.OnResponse(func(r *colly.Response) {
        log.Printf("%s\n", bytes.Replace(r.Body, []byte("\n"), nil, -1))
    })

    // Fetch httpbin.org/headers five times
    for i := 0; i < 5; i++ {
        c.Visit("http://httpbin.org/headers")
    }
}

Enter fullscreen mode Exit fullscreen mode

Now thanks to the fact that we added just a couple of lines of code, we can get the random user-agent.

However, using a simple random user-agent is not always enough; below we will look at options where we will use specific fake user agents.

The library github.com/lib4u/fake-useragent offers thousands of fake user agents, from the real world database.

// Get random user-agent in string
fmt.Println(ua.GetRandom())  // Mozilla/5.0 (iPhone; CPU iPhone OS 18_1_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.1.1 Mobile/15E148 Safari/604.1

// Get user-agent string from a specific browser
fmt.Println(ua.Filter().Chrome().Get())
// Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Mobile Safari/537.36

fmt.Println(ua.Filter().Firefox().Get())
//Mozilla/5.0 (Android 14; Mobile; rv:133.0) Gecko/133.0 Firefox/133.0

fmt.Println(ua.Filter().Safari().Get())
//Mozilla/5.0 (iPhone; CPU iPhone OS 18_1_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.1.1 Mobile/15E148 Safari/604.1
Enter fullscreen mode Exit fullscreen mode

Now I’ll show you using the Go Colly library as an example.

package main

import (
    "bytes"
    "log"
    "github.com/gocolly/colly"
    uaFake "github.com/lib4u/fake-useragent"
)

func main() {

    // Init user-agent faker
    ua, err := uaFake.New()
    if err != nil {
        fmt.Println(err)
    }
    // Instantiate default collector
    c := colly.NewCollector(colly.AllowURLRevisit())

    // Set Fake User Agent
    c.OnRequest(func(r *colly.Request) {
        r.Headers.Set("User-Agent", ua.Filter().Chrome().Platform(uaFake.Desktop).Get())
    })

    // Print the Response
    c.OnResponse(func(r *colly.Response) {
        log.Printf("%s\n", bytes.Replace(r.Body, []byte("\n"), nil, -1))
    })

    // Fetch httpbin.org/headers five times
    for i := 0; i < 5; i++ {
        c.Visit("http://httpbin.org/headers")
    }
}

Enter fullscreen mode Exit fullscreen mode

This way we set up the generation of random user-agents for the desktop version of the browser Google Chrome.

For now every time we visit, the website will identify us as a random desktop user, thus a simple substitution of the user agent can help you when parsing websites, but do not forget about the use of proxies and other system headers.

https://github.com/lib4u/fake-useragent
https://github.com/gocolly/colly

Top comments (0)