This is a submission for the Bright Data Web Scraping Challenge qualifying for two prompts:
- Scrape Data from Complex, Interactive Websites
- Most Creative Use of Web Data for AI Models
What I Built
Meet the SEO Performance Analysis Tool: A comprehensive SEO analytics platform that combines complex web scraping with AI-powered insights. This tool helps SEO professionals and content creators optimize their websites by:
- Analyzing website performance using Google Lighthouse metrics
- Identifying and analyzing top competitors
- Providing AI-powered content optimization suggestions
- Generating detailed SEO reports
Key Features:
- 📊 Lighthouse Performance Analysis: Mobile and desktop performance metrics, accessibility scores, and SEO ratings
- 🔍 Competitor Analysis: Automatic competitor detection and content comparison
- 📝 Content Analysis: AI-powered structural analysis and SEO recommendations
- 📈 Visual Reports: Interactive charts and comparative analysis
- 🤖 AI Integration: Google Gemini AI for intelligent content analysis
Demo
Live Demo: SEO Performance Analysis Tool
Source Code: GitHub Repository
Screenshots
Main Interface: Clean and intuitive interface for URL and keyword input
Lighthouse Analysis: Complex web scraping in action, showing performance metrics
Competitor Analysis: AI-powered competitor content comparison
Content Analysis: Detailed content optimization recommendations
How I Used Bright Data
1. Complex Web Scraping with Scraping Browser
The tool leverages Bright Data's Scraping Browser to handle complex, JavaScript-heavy websites:
# lighthouse.py
def get_lighthouse(target_url: str):
sbr_connection = ChromiumRemoteConnection(SBR_WEBDRIVER, 'goog', 'chrome')
driver = Remote(sbr_connection, options=ChromeOptions())
try:
# Navigate to PageSpeed Insights
encoded_url = f"https://pagespeed.web.dev/analysis?url={target_url}"
driver.get(encoded_url)
# Challenge 1: Wait for dynamic content loading
WebDriverWait(driver, 60).until(
EC.presence_of_element_located((By.CLASS_NAME, "lh-report"))
)
# Challenge 2: Handle tab switching for desktop analysis
desktop_tab = WebDriverWait(driver, 20).until(
EC.element_to_be_clickable((By.ID, "desktop_tab"))
)
actions = ActionChains(driver)
actions.move_to_element(desktop_tab).click().perform()
# Challenge 3: Verify report content changed
WebDriverWait(driver, 20).until(
lambda driver: driver.find_element(By.CLASS_NAME, "lh-report").text != report_text
)
Challenges Overcome:
- Handling dynamic JavaScript content on PageSpeed Insights
- Managing complex user interactions (tab switching between mobile/desktop)
- Extracting structured data from interactive reports
2. Web Unlocker for Competitor Analysis
Used Bright Data's Web Unlocker to access competitor content reliably:
# compare_pages.py - Competitor Content Access
def fetch_html_content(url: str) -> tuple:
try:
# Ensure the URL has a proper scheme
if not url.startswith(('http://', 'https://')):
url = 'https://' + url
# Brightdata API configuration
api_url = "https://api.brightdata.com/request"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {get_api_key('BRIGHTDATA_API_KEY')}"
}
payload = {
"zone": "web_unlocker1",
"url": url,
"format": "raw"
}
# Make request to Brightdata API
response = requests.post(api_url, json=payload, headers=headers)
if response.status_code == 200:
html_content = response.text
soup = BeautifulSoup(html_content, 'html.parser')
tags = soup.find_all(['h1', 'h2', 'h3', 'p'])
collected_html = ''.join(str(tag) for tag in tags)
return url, collected_html
except Exception as e:
print(f"Error fetching HTML content from {url}: {e}")
return url, None
3. SERP API for Competitor Discovery
Integrated Bright Data's SERP API to identify top competitors:
# compare_pages.py - Competitor Discovery
def get_top_competitor(keyword: str, our_domain: str) -> str:
try:
url = "https://api.brightdata.com/request"
# Challenge: Get real-time SERP results and find relevant competitor
encoded_keyword = requests.utils.quote(keyword)
payload = {
"zone": "serp_api1",
"url": f"https://www.google.com/search?q={encoded_keyword}",
"format": "raw"
}
headers = {
"Authorization": f"Bearer {get_api_key('BRIGHTDATA_API_KEY')}",
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
# Parse search results with BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
all_data = soup.find_all("div", {"class": "g"})
# Find first relevant competitor
for result in all_data:
link = result.find('a').get('href')
if (link and
link.find('https') != -1 and
link.find('http') == 0 and
our_domain not in link):
return link
except Exception as e:
st.error(f"Error finding competitor: {str(e)}")
return None
AI Integration Pipeline
-
Data Collection: Use Bright Data services to gather:
- Performance metrics (Lighthouse)
- Competitor content
- SERP data
Data Processing: Structure collected data for AI analysis
-
AI Analysis: Use Google Gemini AI to:
- Compare content quality
- Generate SEO recommendations
- Analyze content structure
Visualization: Present insights through Streamlit interface
Tech Stack
- Frontend: Streamlit
- Backend: Python
- Scraping: Bright Data (Scraping Browser, Web Unlocker, SERP API)
- AI: Google Gemini AI
- Data Visualization: Plotly
Additional Prompt Qualifications
This project qualifies for two prompts:
Scrape Data from Complex, Interactive Websites: The tool successfully handles JavaScript-heavy pages like PageSpeed Insights, managing dynamic content loading and complex user interactions through Bright Data's Scraping Browser.
Most Creative Use of Web Data for AI Models: The project creates an innovative AI pipeline by combining web-scraped data (performance metrics, competitor content, SERP results) with Google Gemini AI to generate intelligent SEO insights and recommendations.
Team Submission
This submission was created by Kenan Can
Thank you for reviewing my submission! Let's make SEO analysis smarter with the power of web scraping and AI.
Top comments (10)
This project offers an excellent solution to the problem it addresses. Congratulations
Thank you for your feedback! 🙏
That's the spirit, that's what I'm looking for a very long time... Thanks for that solution which you made for all of us
Thank you for your kind words! Glad it's helpful! 🙏
It's a great content. It gives very good tips on what to pay attention to in complex data extraction processes.
Thank you! Glad the insights about data extraction were helpful! 🙌
Really enjoyed this! The combination of AI and web scraping for SEO offers great insights.
Thank you! Glad you found it useful! 🙌
How would you integrate Bright Data's proxy service to target specific countries for gathering localized search engine results?
For country-specific targeting with Bright Data proxy, you can use the country parameter in your configuration: