DEV Community

Naman Vashistha
Naman Vashistha

Posted on

Automated Job Search: LinkedIn Jobs to Notion Board

notion board

A Python-based job scraping system that pulls LinkedIn listings into a structured Notion database. Repository: jobs-scrape-to-notion

Setup Steps

  1. Clone the repository:
git clone https://github.com/namanvashistha/jobs-scrape-to-notion
cd jobs-scrape-to-notion
Enter fullscreen mode Exit fullscreen mode
  1. Install dependencies:
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode
  1. Configure Notion:

    • Create a Notion integration at notion.so/my-integrations
    • Create a new Notion database
    • Share your database with the integration
    • Copy the database ID from its URL
  2. Set environment variables:

cp .env.example .env
Enter fullscreen mode Exit fullscreen mode

Update .env with your credentials:

NOTION_API_KEY=your_integration_token
NOTION_DATABASE_ID=your_database_id
Enter fullscreen mode Exit fullscreen mode

Key Features

Job Scraping

def fetch_jobs(search_terms, location, results_wanted=20):
    # Scrapes LinkedIn jobs based on multiple search terms
    # Returns a pandas DataFrame with job details
Enter fullscreen mode Exit fullscreen mode

Notion Integration

  • Creates structured database entries
  • Handles rich text, URLs, dates, and company logos
  • Prevents duplicate entries
  • Manages API rate limits

Data Processing

  • Sanitizes input data
  • Formats salary ranges for Indian currency
  • Handles company metadata
  • Manages file attachments for logos

Running the Scraper

python main.py
Enter fullscreen mode Exit fullscreen mode

Default configuration:

  • Search terms: ["Software Engineer", "Backend", "SDE"]
  • Location: India
  • Results per term: 20
  • Platform: LinkedIn

Customization

Modify main() in scraper.py:

search_terms = ["Your", "Preferred", "Terms"]
location = "Your Location"
results_wanted = 30  # Number of results per term
Enter fullscreen mode Exit fullscreen mode

Error Handling

The system includes:

  • Comprehensive logging
  • Rate limit management
  • Duplicate prevention
  • Data validation

Visit the repository for source code and detailed documentation.

Top comments (0)