DEV Community

Cover image for How to Automate Developer OSINT with Python
Bohdan Lukianets Subscriber for bohdan AI

Posted on

How to Automate Developer OSINT with Python

Hey Dev.to community! ๐Ÿ‘‹

Last month, I had a surprising realization. While applying for a contract position, the client mentioned they'd already checked out several of my old Stack Overflow answers and even a GitHub issue I'd commented on two years ago. It got me thinkingโ€”just how visible is my digital footprint as a developer?

In this post, I'll show you how I built a simple Python tool leveraging Open Source Intelligence (OSINT) techniques to gather developer-related information from public sources. We'll use my own nickname "bohdanai" as a practical example.

๐Ÿ“Œ What is OSINT?

Open Source Intelligence (OSINT) refers to collecting and analyzing publicly available data. For developers, OSINT can help you quickly gather useful information such as GitHub profiles, LinkedIn profiles, open-source contributions, and professional activities.

๐ŸŽฏ Our Goal

We'll create a Python script that:

  • Accepts a developer's name or nickname.
  • Searches online sources like Google for mentions.
  • Extracts relevant LinkedIn and GitHub profile links.
  • Optionally scrapes specific websites for mentions.
  • Returns structured results for further analysis.

๐Ÿ› ๏ธ Tools and Libraries Used:

  • requests โ€” for sending HTTP requests.
  • BeautifulSoup โ€” for parsing HTML content.
  • googlesearch-python โ€” to perform Google searches directly from Python.

Install these libraries with:

pip install requests beautifulsoup4 googlesearch-python

## ๐Ÿ› ๏ธ Tools and Libraries Used:

- `requests` โ€” for sending HTTP requests.
- `BeautifulSoup` โ€” for parsing HTML content.
- `googlesearch-python` โ€” to perform Google searches directly from Python.

Install these libraries with:

Enter fullscreen mode Exit fullscreen mode


bash
pip install requests beautifulsoup4 googlesearch-python

Hey Dev.to community! ๐Ÿ‘‹

Last month, I had a surprising realization. While applying for a contract position, the client mentioned they'd already checked out several of my old Stack Overflow answers and even a GitHub issue I'd commented on two years ago. It got me thinkingโ€”just how visible is my digital footprint as a developer?

In this post, I'll show you how I built a simple Python tool leveraging Open Source Intelligence (OSINT) techniques to gather developer-related information from public sources. We'll use my own nickname "bohdanai" as a practical example.

๐Ÿ“Œ What is OSINT?

Open Source Intelligence (OSINT) refers to collecting and analyzing publicly available data. For developers, OSINT can help you quickly gather useful information such as GitHub repositories, LinkedIn profiles, open-source contributions, and professional activities.

๐ŸŽฏ Our Goal

We'll create a Python script that:

  • Accepts a developer's name or nickname.
  • Searches online sources like Google for mentions.
  • Extracts relevant LinkedIn and GitHub profile links.
  • Optionally scrapes specific websites for mentions.
  • Returns structured results for further analysis.

๐Ÿ› ๏ธ Tools and Libraries Used:

  • requests โ€” for sending HTTP requests.
  • BeautifulSoup โ€” for parsing HTML content.
  • googlesearch-python โ€” to perform Google searches directly from Python.

Install these libraries with:

pip install requests beautifulsoup4 googlesearch-python
Enter fullscreen mode Exit fullscreen mode

โš™๏ธ Python Script Implementation

Here's the complete Python script:

import requests
from bs4 import BeautifulSoup
from googlesearch import search
import json

def search_developer(name, num_results=10, website_url=None):
    results = {
        "name": name,
        "google_results": [],
        "linkedin_url": None,
        "github_url": None,
        "website_mentions": []
    }

    # Google Search
    print(f"๐Ÿ” Searching Google for '{name}'...")
    try:
        for url in search(name, num=num_results, stop=num_results, pause=2):
            results["google_results"].append(url)
    except Exception as e:
        print(f"โš ๏ธ Google search error: {e}")

    # LinkedIn URL extraction
    print("๐Ÿ”— Extracting LinkedIn URL...")
    for url in results["google_results"]:
        if "linkedin.com/in/" in url:
            results["linkedin_url"] = url
            break

    # GitHub URL extraction
    print("๐Ÿฑ Extracting GitHub URL...")
    for url in results["google_results"]:
        if "github.com" in url and "/issues" not in url and "/pull" not in url:
            results["github_url"] = url
            break

    # Optional: Website scraping
    if website_url:
        print(f"๐ŸŒ Scraping mentions from {website_url}...")
        try:
            response = requests.get(website_url, timeout=5)
            soup = BeautifulSoup(response.text, 'html.parser')
            texts = soup.stripped_strings
            for text in texts:
                if name.lower() in text.lower():
                    results["website_mentions"].append(text)
        except requests.RequestException as e:
            print(f"โš ๏ธ Website scraping error: {e}")

    return results

def save_to_json(data, filename="osint_results.json"):
    with open(filename, 'w') as f:
        json.dump(data, f, indent=2)
    print(f"๐Ÿ“ Results saved to {filename}")

if __name__ == "__main__":
    developer_name = input("Enter developer name or username to research: ") or "bohdanai"
    my_website = "https://bohdanlukianets.pro"

    data = search_developer(developer_name, website_url=my_website)

    print("\n๐Ÿ“‹ --- OSINT Results ---")
    print(f"๐Ÿ‘ค Name/Nickname: {data['name']}")
    print(f"๐Ÿ”— LinkedIn: {data.get('linkedin_url', 'Not found')}")
    print(f"๐Ÿ™ GitHub: {data.get('github_url', 'Not found')}")

    print("\n๐ŸŒ Google Results:")
    for idx, url in enumerate(data["google_results"], start=1):
        print(f"{idx}. {url}")

    if data["website_mentions"]:
        print(f"\n๐Ÿ”– Mentions on {my_website}:")
        for mention in data["website_mentions"]:
            print(f"- {mention}")
    else:
        print(f"\n๐Ÿ”– No mentions found on {my_website}.")

    save_to_json(data)
Enter fullscreen mode Exit fullscreen mode

๐Ÿš€ Running the Script

Simply save the code above in a file called osint_script.py and run it:

python osint_script.py
Enter fullscreen mode Exit fullscreen mode

You'll be prompted to enter a username or you can hit enter to use the default "bohdanai".

โœ… Strengths of This Script:

  • Easy to Understand: Clear structure and inline comments.
  • Beginner-Friendly: Great introduction to web scraping and OSINT.
  • Basic Error Handling: Handles exceptions in requests and search operations.
  • Popular Libraries: Uses well-known and maintained Python libraries.

โš ๏ธ Weaknesses & Points for Improvement:

  • Reliability: Google scraping might lead to temporary IP blocks.
  • Limited Personalization: Currently uses fixed URLs; consider adding custom inputs.
  • Ethical Concerns: Always ensure you have permission and follow ethical guidelines.
  • Output Format: Consider adding detailed CSV or Excel outputs for easier analysis.
  • Rate Limiting: Add delays (time.sleep()) for more extensive searches to avoid IP blocks.

๐Ÿ” Ethical Considerations:

Always respect privacy and legal frameworks when gathering personal data from the web. This script is intended purely for educational purposes and ethical OSINT investigations.

๐Ÿง  Lessons Learned & Practical Tips:

  • Google rate limits are realโ€”use pauses between requests.
  • Always scrape responsibly to avoid getting IP-blocked.
  • Results vary widely based on online activity and SEO.

๐Ÿ”ฎ Where This Could Go Next:

  • Using official APIs instead of scraping.
  • Sentiment analysis on collected mentions.
  • Visualizing your digital presence.
  • Automated monitoring for new mentions.

๐ŸŒ Connect With Me:

Feel free to connect with me through these platforms:

๐Ÿ’ฌ Your Turn!

Have you checked your own digital footprint? Were you surprised by what you found? What enhancements or ideas would you like to see next? Let me know your thoughts in the comments!

Happy coding! ๐ŸŒŸ๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿ‘จโ€๐Ÿ’ป

Top comments (0)