Scrape Google Search Results Using Python
Google holds an immense volume of data for businesses and researchers. It performs over 8.5 billion daily searches and commands a 91% share of the global search engine market.
Since the debut of ChatGPT, Google data has been utilized not only for traditional purposes like rank tracking, competitor monitoring, and lead generation but also for developing advanced LLM models, training AI models, and enhancing the capabilities of Natural Language Processing (NLP) models.
Scraping Google, however, is not easy for everyone. It requires a team of professionals and a robust infrastructure to scrape at scale.
In this article, we will learn to scrape Google Search Results using Python and BeautifulSoup. This will enable you to build your own tools and models that are capable of leveraging Google’s data at scale.
Let’s get started!
What are Google Search Results?
Google Search Results are the listings that appear on Google based on the user query entered in the search bar. Google heavily utilizes NLP to understand these queries and present users with relevant results. These results often include featured snippets in addition to organic results, such as the latest AI overviews, People Also Ask sections, Related Searches, and Knowledge Graphs. These elements provide summarized and related information to users based on their queries.
Applications Of Scraping Google Search Data
Google Search Data has various applications:
- Building a rank and keyword tracker for SEO purposes.
- Searching for local businesses.
- Building LLM engines.
- Discovering exploding topics for potential trends in the future.
Why Python for scraping Google?
Python is a versatile and robust language that provides a powerful HTTP handshake configuration for scraping websites that other languages may struggle with or have lower success rates. As the popularity of AI models trained on web-scraped data grows, Python’s relevance in web-scraping topics continues to rise within the developer community.
Additionally, beginners looking to learn Python as a web scraping skill can understand it easily due to its simple syntax and code clarity. Plus, it has huge community support on platforms like Discord, Reddit, etc., which can help with any level of problem you are facing.
This scalable language excels in web scraping performance and provides powerful frameworks like Scrapy, Requests, and BeautifulSoup, making it a superior choice for scraping Google and other websites compared to other languages.
Scraping Google Search Results With Python
This section will teach us to create a basic Python script to retrieve the first 10 Google search results.
Requirements
To follow this tutorial we need to install the following libraries:
Requests — To pull HTML data from the Google Search URL.
BeautifulSoup — To refine HTML data in a structured format.
Setup
The setup is simple. Create a Python file and install the required libraries to get started.
Run the following commands in your project folder:
touch scraper.py
And then install the libraries.
pip install requests
pip install beautifulsoup4
Process
We are done with the setup and have all the stuff to move forward. We will use the Requests library in Python to extract the raw HTML and the BeautifulSoup to refine it and get the desired information.
But what is “desired information” here?
The filtered data would contain this information:
- Title
- Link
- Displayed Link
- Description
- Position of the result
Let us import our installed libraries first in the scraper.py file.
from bs4 import BeautifulSoup
import requests
Then, we will make a GET request on the target URL to fetch the raw HTML data from Google.
headers={'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.361681276786'}
url='https://www.google.com/search?q=python+tutorials&gl=us'
response = requests.get(url,headers=headers)
print(response.status_code)
Passing headers is important to make the scraper look like a natural user who is just visiting the Google search page for some information.
The above code will help you in pulling the HTML data from the Google Search link. If you got the 200 status code, that means the request was successful. This completes the first part of creating a scraper for Google.
In the next part, we will use BeautifulSoup to get out the required data from HTML.
soup = BeautifulSoup(response.text, ‘html.parser’)
This will create a BS4 object to parse the HTML response and thus we will be able to easily navigate inside the HTML and find any element of choice and the content inside it.
To parse this HTML, we would need to first inspect the Google Search Page to check which common pattern can be found in the DOM location of the search results.
So, after inspecting we found out that every search result is under div container with the class g. This means, we just have to run a loop over each div container with g class to get the information inside it.
Before writing the code, we will find the DOM location for the title, description, and link from the HTML.
If you inspect the title, you’ll find that it is contained within an h3 tag. From the image, we can also see that the link is located in the href attribute of the anchor tag.
The displayed link or the cite link can be found inside the cite tag.
And finally, the description is stored inside a div container with the class VwiC3b
.
Wrapping all these data entities into a single block of code:
organic_results = []
i = 0
# Parse organic results with error handling
for el in soup.select(".g"):
try:
title = el.select_one("h3").text if el.select_one("h3") else "No title"
displayed_link = el.select_one(".byrV5b cite").text if el.select_one(".byrV5b cite") else "No displayed link"
link = el.select_one("a")["href"] if el.select_one("a") else "No link"
description = el.select_one(".VwiC3b").text if el.select_one(".VwiC3b") else "No description"
organic_results.append({
"title": title,
"displayed_link": displayed_link,
"link": link,
"description": description,
"rank": i + 1
})
i += 1
except Exception as e:
print(f"Error parsing element: {e}")
print(organic_results)
We declared an organic results array and then looped over all the elements with g class in the HTML and pushed the collected data inside the array.
Running this code will give you the desired results which you can use for various purposes including rank tracking, lead generation, and optimizing the SEO of the website.
[
{
"title": "Python Tutorial",
"displayed_link": "https://www.w3schools.com \u203a python",
"link": "https://www.w3schools.com/python/",
"description": "Learn Python. Python is a popular programming language. Python can be used on a server to create web applications. Start learning Python now.",
"rank": 1
},
{
"title": "The Python Tutorial \u2014 Python 3.13.1 documentation",
"displayed_link": "https://docs.python.org \u203a tutorial",
"link": "https://docs.python.org/3/tutorial/index.html",
"description": "This tutorial introduces the reader informally to the basic concepts and features of the Python language and system. It helps to have a Python interpreter handy\u00a0...",
"rank": 2
},
....
]
So, that’s how a basic Google Scraping script is created.
However, there is a CATCH. We still can’t completely rely on this method as this can result in a block of our IP by Google. If we want to scrape search results at scale, we need a vast network of premium and non-premium proxies and advanced techniques that can make this possible. That’s where the SERP APIs come into play!
Scraping Google Using ApiForSeo’s SERP API
Another method for scraping Google is using a dedicated SERP API. They are much more reliable and don’t let you get blocked in the scraping process.
The setup for this section would be the same, just we need to register on ApiForSeo to get our API Key which will provide us with access to its SERP API.
Getting API Credentials From ApiForSeo
After activating the account, you will be redirected to the dashboard where you will get your API Key.
You can also copy the code from the dashboard itself.
Setting Up our code for scraping search results
Then, we will create an API request on a random query to scrape data through ApiForSeo SERP API.
import requests
api_key = "APIKEY"
url = "https://api.apiforseo.com/google_search"
params = {
"api_key": api_key,
"q": "elon+musk",
"gl": "us",
}
response = requests.get(url, params=params)
if response.status_code == 200:
data = response.json()
print(data)
else:
print(f"Request failed with status code: {response.status_code}")
You can try any other query also. Don’t forget to put your API Key into the code otherwise, you will receive a 404 error.
Running this code in your terminal would immediately give you results.
"organic_results": [
{
"title": "Elon Musk - Wikipedia",
"displayed_link": "https://en.wikipedia.org › wiki › Elon_Musk",
"snippet": "Elon Reeve Musk is a businessman known for his key roles in the space company SpaceX and the automotive company Tesla, Inc. His other involvements include ...Musk family · Tesla Roadster · Tesla, SpaceX, and the Quest... · Maye Musk",
"link": "https://en.wikipedia.org/wiki/Elon_Musk",
"extended_sitelinks": [
{
"title": "Musk family",
"link": "https://en.wikipedia.org/wiki/Musk_family"
},
{
"title": "Tesla Roadster",
"link": "https://en.wikipedia.org/wiki/Elon_Musk%27s_Tesla_Roadster"
},
{
"title": "Tesla, SpaceX, and the Quest...",
"link": "https://en.wikipedia.org/wiki/Elon_Musk:_Tesla,_SpaceX,_and_the_Quest_for_a_Fantastic_Future"
},
{
"title": "Maye Musk",
"link": "https://en.wikipedia.org/wiki/Maye_Musk"
}
],
"rank": 1
},
{
"title": "Elon Musk - Forbes",
"displayed_link": "https://www.forbes.com › profile › elon-musk",
"snippet": "Real Time Net Worth · Elon Musk cofounded seven companies, including electric car maker Tesla, rocket producer SpaceX and artificial intelligence startup xAI.Will Elon Musk’s Silicon Valley... · Forbes Real Time Billionaires · Tesla · Peter Thiel",
"link": "https://www.forbes.com/profile/elon-musk/",
"extended_sitelinks": [
{
"title": "Will Elon Musk’s Silicon Valley...",
"link": "https://www.forbes.com/sites/gregorme/2024/12/11/will-elon-musks-silicon-valley-playbook-work-in-government/"
},
{
"title": "Forbes Real Time Billionaires",
"link": "https://www.forbes.com/real-time-billionaires/"
},
{
"title": "Tesla",
"link": "https://www.forbes.com/companies/tesla/"
},
{
"title": "Peter Thiel",
"link": "https://www.forbes.com/profile/peter-thiel/"
}
],
"rank": 2
},
.....
]
The above data contains various points, including titles, links, snippets, descriptions, and featured snippets like extended sitelinks. You will also get advanced feature snippets like People Also Ask For, Knowledge Graph, Answer Boxes, etc., from this API.
Conclusion
The nature of business is evolving at a rapid pace. If you don’t have access to data about ongoing trends and your competitors, you risk falling behind emerging businesses that make data-driven strategic decisions at every step. Therefore, it is crucial for a business to understand what is happening in its environment, and Google can be one of the best data sources for this purpose.
In this tutorial, we learned how to scrape Google search results using Python. If you found this blog helpful, please share it on social media and other platforms.
Thank you!
Top comments (0)