A Python-based job scraping system that pulls LinkedIn listings into a structured Notion database. Repository: jobs-scrape-to-notion
Setup Steps
- Clone the repository:
git clone https://github.com/namanvashistha/jobs-scrape-to-notion
cd jobs-scrape-to-notion
- Install dependencies:
pip install -r requirements.txt
-
Configure Notion:
- Create a Notion integration at notion.so/my-integrations
- Create a new Notion database
- Share your database with the integration
- Copy the database ID from its URL
Set environment variables:
cp .env.example .env
Update .env
with your credentials:
NOTION_API_KEY=your_integration_token
NOTION_DATABASE_ID=your_database_id
Key Features
Job Scraping
def fetch_jobs(search_terms, location, results_wanted=20):
# Scrapes LinkedIn jobs based on multiple search terms
# Returns a pandas DataFrame with job details
Notion Integration
- Creates structured database entries
- Handles rich text, URLs, dates, and company logos
- Prevents duplicate entries
- Manages API rate limits
Data Processing
- Sanitizes input data
- Formats salary ranges for Indian currency
- Handles company metadata
- Manages file attachments for logos
Running the Scraper
python main.py
Default configuration:
- Search terms: ["Software Engineer", "Backend", "SDE"]
- Location: India
- Results per term: 20
- Platform: LinkedIn
Customization
Modify main()
in scraper.py
:
search_terms = ["Your", "Preferred", "Terms"]
location = "Your Location"
results_wanted = 30 # Number of results per term
Error Handling
The system includes:
- Comprehensive logging
- Rate limit management
- Duplicate prevention
- Data validation
Visit the repository for source code and detailed documentation.
Top comments (0)