Hey Dev.to community! ๐
Last month, I had a surprising realization. While applying for a contract position, the client mentioned they'd already checked out several of my old Stack Overflow answers and even a GitHub issue I'd commented on two years ago. It got me thinkingโjust how visible is my digital footprint as a developer?
In this post, I'll show you how I built a simple Python tool leveraging Open Source Intelligence (OSINT) techniques to gather developer-related information from public sources. We'll use my own nickname "bohdanai" as a practical example.
๐ What is OSINT?
Open Source Intelligence (OSINT) refers to collecting and analyzing publicly available data. For developers, OSINT can help you quickly gather useful information such as GitHub profiles, LinkedIn profiles, open-source contributions, and professional activities.
๐ฏ Our Goal
We'll create a Python script that:
- Accepts a developer's name or nickname.
- Searches online sources like Google for mentions.
- Extracts relevant LinkedIn and GitHub profile links.
- Optionally scrapes specific websites for mentions.
- Returns structured results for further analysis.
๐ ๏ธ Tools and Libraries Used:
-
requests
โ for sending HTTP requests. -
BeautifulSoup
โ for parsing HTML content. -
googlesearch-python
โ to perform Google searches directly from Python.
Install these libraries with:
pip install requests beautifulsoup4 googlesearch-python
## ๐ ๏ธ Tools and Libraries Used:
- `requests` โ for sending HTTP requests.
- `BeautifulSoup` โ for parsing HTML content.
- `googlesearch-python` โ to perform Google searches directly from Python.
Install these libraries with:
bash
pip install requests beautifulsoup4 googlesearch-python
Hey Dev.to community! ๐
Last month, I had a surprising realization. While applying for a contract position, the client mentioned they'd already checked out several of my old Stack Overflow answers and even a GitHub issue I'd commented on two years ago. It got me thinkingโjust how visible is my digital footprint as a developer?
In this post, I'll show you how I built a simple Python tool leveraging Open Source Intelligence (OSINT) techniques to gather developer-related information from public sources. We'll use my own nickname "bohdanai" as a practical example.
๐ What is OSINT?
Open Source Intelligence (OSINT) refers to collecting and analyzing publicly available data. For developers, OSINT can help you quickly gather useful information such as GitHub repositories, LinkedIn profiles, open-source contributions, and professional activities.
๐ฏ Our Goal
We'll create a Python script that:
- Accepts a developer's name or nickname.
- Searches online sources like Google for mentions.
- Extracts relevant LinkedIn and GitHub profile links.
- Optionally scrapes specific websites for mentions.
- Returns structured results for further analysis.
๐ ๏ธ Tools and Libraries Used:
-
requests
โ for sending HTTP requests. -
BeautifulSoup
โ for parsing HTML content. -
googlesearch-python
โ to perform Google searches directly from Python.
Install these libraries with:
pip install requests beautifulsoup4 googlesearch-python
โ๏ธ Python Script Implementation
Here's the complete Python script:
import requests
from bs4 import BeautifulSoup
from googlesearch import search
import json
def search_developer(name, num_results=10, website_url=None):
results = {
"name": name,
"google_results": [],
"linkedin_url": None,
"github_url": None,
"website_mentions": []
}
# Google Search
print(f"๐ Searching Google for '{name}'...")
try:
for url in search(name, num=num_results, stop=num_results, pause=2):
results["google_results"].append(url)
except Exception as e:
print(f"โ ๏ธ Google search error: {e}")
# LinkedIn URL extraction
print("๐ Extracting LinkedIn URL...")
for url in results["google_results"]:
if "linkedin.com/in/" in url:
results["linkedin_url"] = url
break
# GitHub URL extraction
print("๐ฑ Extracting GitHub URL...")
for url in results["google_results"]:
if "github.com" in url and "/issues" not in url and "/pull" not in url:
results["github_url"] = url
break
# Optional: Website scraping
if website_url:
print(f"๐ Scraping mentions from {website_url}...")
try:
response = requests.get(website_url, timeout=5)
soup = BeautifulSoup(response.text, 'html.parser')
texts = soup.stripped_strings
for text in texts:
if name.lower() in text.lower():
results["website_mentions"].append(text)
except requests.RequestException as e:
print(f"โ ๏ธ Website scraping error: {e}")
return results
def save_to_json(data, filename="osint_results.json"):
with open(filename, 'w') as f:
json.dump(data, f, indent=2)
print(f"๐ Results saved to {filename}")
if __name__ == "__main__":
developer_name = input("Enter developer name or username to research: ") or "bohdanai"
my_website = "https://bohdanlukianets.pro"
data = search_developer(developer_name, website_url=my_website)
print("\n๐ --- OSINT Results ---")
print(f"๐ค Name/Nickname: {data['name']}")
print(f"๐ LinkedIn: {data.get('linkedin_url', 'Not found')}")
print(f"๐ GitHub: {data.get('github_url', 'Not found')}")
print("\n๐ Google Results:")
for idx, url in enumerate(data["google_results"], start=1):
print(f"{idx}. {url}")
if data["website_mentions"]:
print(f"\n๐ Mentions on {my_website}:")
for mention in data["website_mentions"]:
print(f"- {mention}")
else:
print(f"\n๐ No mentions found on {my_website}.")
save_to_json(data)
๐ Running the Script
Simply save the code above in a file called osint_script.py
and run it:
python osint_script.py
You'll be prompted to enter a username or you can hit enter to use the default "bohdanai".
โ Strengths of This Script:
- Easy to Understand: Clear structure and inline comments.
- Beginner-Friendly: Great introduction to web scraping and OSINT.
- Basic Error Handling: Handles exceptions in requests and search operations.
- Popular Libraries: Uses well-known and maintained Python libraries.
โ ๏ธ Weaknesses & Points for Improvement:
- Reliability: Google scraping might lead to temporary IP blocks.
- Limited Personalization: Currently uses fixed URLs; consider adding custom inputs.
- Ethical Concerns: Always ensure you have permission and follow ethical guidelines.
- Output Format: Consider adding detailed CSV or Excel outputs for easier analysis.
-
Rate Limiting: Add delays (
time.sleep()
) for more extensive searches to avoid IP blocks.
๐ Ethical Considerations:
Always respect privacy and legal frameworks when gathering personal data from the web. This script is intended purely for educational purposes and ethical OSINT investigations.
๐ง Lessons Learned & Practical Tips:
- Google rate limits are realโuse pauses between requests.
- Always scrape responsibly to avoid getting IP-blocked.
- Results vary widely based on online activity and SEO.
๐ฎ Where This Could Go Next:
- Using official APIs instead of scraping.
- Sentiment analysis on collected mentions.
- Visualizing your digital presence.
- Automated monitoring for new mentions.
๐ Connect With Me:
Feel free to connect with me through these platforms:
- ๐ Personal Website: bohdanlukianets.pro
- ๐ผ LinkedIn: linkedin.com/in/bohdanlukianets
- ๐ GitHub: github.com/bohdanaims
- ๐ Dev.to: dev.to/bohdanaims
๐ฌ Your Turn!
Have you checked your own digital footprint? Were you surprised by what you found? What enhancements or ideas would you like to see next? Let me know your thoughts in the comments!
Happy coding! ๐๐ฉโ๐ป๐จโ๐ป
Top comments (0)