DEV Community

Alvin Mustafa
Alvin Mustafa

Posted on

A COMPREHENSIVE GUIDE TO SETTING UP A DATA ENGINEERING PROJECT ENVIRONMENT.

This article covers the following key concepts:

  • Setting up a cloud account(AWS)
  • Installing and configuring key data engineering tools(PostgreSQL, SQL clients, data storage solutions, Github, etc)
  • Networking and permissions(IAM roles, access control).
  • Preparing data pipelines, ETL processes, and database connections.
  • Integrating with cloud services like s3.
  • Best practices for environment configuration.

Setting up cloud account(AWS)

This is the process of creating and configuring your AWS account:

Step 1: Visit the AWS website.

Go to AWS's Website and click on ### 'Create a Free Account'

Step 2: Provide account details

  • Email Address
  • Account name
  • Password

Step 3: Choose an account type

AWS offers two types of accounts:

  • Personal account: Ideal for individual users.
  • Business account: Suitable for businesses and enterprises. Select the personal account for learning purposes.

Step 4: Enter personal and payment information

  • Full name, address and phone number
  • Credit/Debit card details: AWS requires payment details even if you are creating a free account.

Step 5: Identity verification.

  • Solve the CAPTCHA verification.
  • Enter the OTP sent to your registered phone number.

Step 6: Choose a support plan.

AWS provides multiple support plans:
For beginners Basic plan is sufficient.

Installing and configuring PostgreSQL

  • Installation

    1. Download the installer from PostgreSQL official site.
    2. Run the installer and follow the setup instructions
    3. Set a password for postgres user when prompted.
  • Basic Configuration

    1. Access PostgreSQL CLI:

    psql -U postgres

  1. Change the default password:

    ALTER USER postgres PASSWORD 'your_secure_password';

  2. Create a new database:

    CREATE DATABASE newdatabase;

Installing and configuring SQL clients(DBeaver)

SQL clients help manage and interact with databases visually.
Download DBeaver from DBeaver.io

Connecting to PostgreSQL using SQL clients

  • Open the SQL client(DBeaver)
  • Create a new connection using:
    • Host: localhost
    • Port: 5432
    • Username: postgres
    • Password: [your_password]
    • Database: newdatabase

Installing and configuring Github

Github is a version control that is essential in data engineering

  • Creating a github account

    1. Go to Github's website
    2. Click signup
    3. Fill in the details
    4. Verify Your email
    5. Choose the setup(Free is enough to start)
  • Installing Git
    Download and install git from git-scm.com

  • Configuring Git
    Open the git bash and run the following commands:

       git config --global user.name "Your Name"
       git config --global user.email "your_email@example.com"
    
  • Using GitHub

    • Create a repository:
        git init
        git remote add origin 
        https://github.com/yourusername/repo.git 
    
    • Push your code:
        git add .
        git commit -m "Initial commit"
        git push -u origin main
    

Top comments (0)