DEV Community

Cover image for SEO Performance Analysis Tool: AI-Powered SEO Insights with Complex Web Scraping
Kenan Can
Kenan Can

Posted on

SEO Performance Analysis Tool: AI-Powered SEO Insights with Complex Web Scraping

This is a submission for the Bright Data Web Scraping Challenge qualifying for two prompts:

  1. Scrape Data from Complex, Interactive Websites
  2. Most Creative Use of Web Data for AI Models

What I Built

Meet the SEO Performance Analysis Tool: A comprehensive SEO analytics platform that combines complex web scraping with AI-powered insights. This tool helps SEO professionals and content creators optimize their websites by:

  • Analyzing website performance using Google Lighthouse metrics
  • Identifying and analyzing top competitors
  • Providing AI-powered content optimization suggestions
  • Generating detailed SEO reports

Key Features:

  • 📊 Lighthouse Performance Analysis: Mobile and desktop performance metrics, accessibility scores, and SEO ratings
  • 🔍 Competitor Analysis: Automatic competitor detection and content comparison
  • 📝 Content Analysis: AI-powered structural analysis and SEO recommendations
  • 📈 Visual Reports: Interactive charts and comparative analysis
  • 🤖 AI Integration: Google Gemini AI for intelligent content analysis

Demo

Live Demo: SEO Performance Analysis Tool

Source Code: GitHub Repository

Screenshots

  1. Main Interface: Clean and intuitive interface for URL and keyword input
    Main Interface

  2. Lighthouse Analysis: Complex web scraping in action, showing performance metrics
    Lighthouse Results

  3. Competitor Analysis: AI-powered competitor content comparison
    Competitor Analysis

  4. Content Analysis: Detailed content optimization recommendations
    Content Analysis

How I Used Bright Data

1. Complex Web Scraping with Scraping Browser

The tool leverages Bright Data's Scraping Browser to handle complex, JavaScript-heavy websites:

# lighthouse.py
def get_lighthouse(target_url: str):
    sbr_connection = ChromiumRemoteConnection(SBR_WEBDRIVER, 'goog', 'chrome')
    driver = Remote(sbr_connection, options=ChromeOptions())

    try:
        # Navigate to PageSpeed Insights
        encoded_url = f"https://pagespeed.web.dev/analysis?url={target_url}"
        driver.get(encoded_url)

        # Challenge 1: Wait for dynamic content loading
        WebDriverWait(driver, 60).until(
            EC.presence_of_element_located((By.CLASS_NAME, "lh-report"))
        )

        # Challenge 2: Handle tab switching for desktop analysis
        desktop_tab = WebDriverWait(driver, 20).until(
            EC.element_to_be_clickable((By.ID, "desktop_tab"))
        )
        actions = ActionChains(driver)
        actions.move_to_element(desktop_tab).click().perform()

        # Challenge 3: Verify report content changed
        WebDriverWait(driver, 20).until(
            lambda driver: driver.find_element(By.CLASS_NAME, "lh-report").text != report_text
        )
Enter fullscreen mode Exit fullscreen mode

Challenges Overcome:

  • Handling dynamic JavaScript content on PageSpeed Insights
  • Managing complex user interactions (tab switching between mobile/desktop)
  • Extracting structured data from interactive reports

2. Web Unlocker for Competitor Analysis

Used Bright Data's Web Unlocker to access competitor content reliably:

# compare_pages.py - Competitor Content Access
def fetch_html_content(url: str) -> tuple:
    try:
        # Ensure the URL has a proper scheme
        if not url.startswith(('http://', 'https://')):
            url = 'https://' + url

        # Brightdata API configuration
        api_url = "https://api.brightdata.com/request"
        headers = {
            "Content-Type": "application/json",
            "Authorization": f"Bearer {get_api_key('BRIGHTDATA_API_KEY')}"
        }
        payload = {
            "zone": "web_unlocker1",
            "url": url,
            "format": "raw"
        }

        # Make request to Brightdata API
        response = requests.post(api_url, json=payload, headers=headers)

        if response.status_code == 200:
            html_content = response.text
            soup = BeautifulSoup(html_content, 'html.parser')
            tags = soup.find_all(['h1', 'h2', 'h3', 'p'])
            collected_html = ''.join(str(tag) for tag in tags)
            return url, collected_html
    except Exception as e:
        print(f"Error fetching HTML content from {url}: {e}")
        return url, None
Enter fullscreen mode Exit fullscreen mode

3. SERP API for Competitor Discovery

Integrated Bright Data's SERP API to identify top competitors:

# compare_pages.py - Competitor Discovery
def get_top_competitor(keyword: str, our_domain: str) -> str:
    try:
        url = "https://api.brightdata.com/request"

        # Challenge: Get real-time SERP results and find relevant competitor
        encoded_keyword = requests.utils.quote(keyword)

        payload = {
            "zone": "serp_api1",
            "url": f"https://www.google.com/search?q={encoded_keyword}",
            "format": "raw"
        }

        headers = {
            "Authorization": f"Bearer {get_api_key('BRIGHTDATA_API_KEY')}",
            "Content-Type": "application/json"
        }

        response = requests.post(url, json=payload, headers=headers)

        if response.status_code == 200:
            # Parse search results with BeautifulSoup
            soup = BeautifulSoup(response.text, 'html.parser')
            all_data = soup.find_all("div", {"class": "g"})

            # Find first relevant competitor
            for result in all_data:
                link = result.find('a').get('href')
                if (link and 
                    link.find('https') != -1 and 
                    link.find('http') == 0 and 
                    our_domain not in link):
                    return link

    except Exception as e:
        st.error(f"Error finding competitor: {str(e)}")
        return None
Enter fullscreen mode Exit fullscreen mode

AI Integration Pipeline

  1. Data Collection: Use Bright Data services to gather:

    • Performance metrics (Lighthouse)
    • Competitor content
    • SERP data
  2. Data Processing: Structure collected data for AI analysis

  3. AI Analysis: Use Google Gemini AI to:

    • Compare content quality
    • Generate SEO recommendations
    • Analyze content structure
  4. Visualization: Present insights through Streamlit interface

Tech Stack

  • Frontend: Streamlit
  • Backend: Python
  • Scraping: Bright Data (Scraping Browser, Web Unlocker, SERP API)
  • AI: Google Gemini AI
  • Data Visualization: Plotly

Additional Prompt Qualifications

This project qualifies for two prompts:

  1. Scrape Data from Complex, Interactive Websites: The tool successfully handles JavaScript-heavy pages like PageSpeed Insights, managing dynamic content loading and complex user interactions through Bright Data's Scraping Browser.

  2. Most Creative Use of Web Data for AI Models: The project creates an innovative AI pipeline by combining web-scraped data (performance metrics, competitor content, SERP results) with Google Gemini AI to generate intelligent SEO insights and recommendations.

Team Submission

This submission was created by Kenan Can

Thank you for reviewing my submission! Let's make SEO analysis smarter with the power of web scraping and AI.

Top comments (10)

Collapse
 
hilal_kara_dfb830b55a5e45 profile image
Hilal Kara

This project offers an excellent solution to the problem it addresses. Congratulations

Collapse
 
kenancan profile image
Kenan Can

Thank you for your feedback! 🙏

Collapse
 
canmahmutn profile image
Can Uçanefe

That's the spirit, that's what I'm looking for a very long time... Thanks for that solution which you made for all of us

Collapse
 
kenancan profile image
Kenan Can

Thank you for your kind words! Glad it's helpful! 🙏

Collapse
 
anl_egr_5c83da0fb58092465 profile image
Anl Egr

It's a great content. It gives very good tips on what to pay attention to in complex data extraction processes.

Collapse
 
kenancan profile image
Kenan Can

Thank you! Glad the insights about data extraction were helpful! 🙌

Collapse
 
melikesultancan profile image
Melike Sultan Can

Really enjoyed this! The combination of AI and web scraping for SEO offers great insights.

Collapse
 
kenancan profile image
Kenan Can

Thank you! Glad you found it useful! 🙌

Collapse
 
terraflop profile image
Terraflop

How would you integrate Bright Data's proxy service to target specific countries for gathering localized search engine results?

Collapse
 
kenancan profile image
Kenan Can

For country-specific targeting with Bright Data proxy, you can use the country parameter in your configuration:

payload = {
            "zone": "serp_api1",
            "country": "us",  # target country code
            "url": f"https://www.google.com/search?q={encoded_keyword}",
            "format": "raw"
        }
Enter fullscreen mode Exit fullscreen mode