DEV Community

Cover image for Becoming an Analytics Engineer I
Anuoluwapo Balogun
Anuoluwapo Balogun

Posted on

Becoming an Analytics Engineer I

Are you aspiring to become an Analytics Engineer and you do not know how to get started? You can start with Udemy Analytics Engineering Bootcamp like I did too. Click to view course.

Here is a brief documentation of what I learnt. (You need to take the course to have a complete understanding of what data modelling, methodology and technologies is all about)

The methodology used is Kimball's Warehouse Methodology and what is this all about?

Image description

Kimball's Warehouse Methodology

  • Created by Ralph Kimball

  • Defines data warehouse as a copy of transaction data specifically structured for query and analysis.

  • Starts with identifying key business process and requirements (Bottom up)

  • Focus of this approach is to enable business intelligence fast.

  • Data marts are created first instead of enterprise data warehouse.

  • Dimensional model - STAR SCHEMA design (deformalized)

  • The model design is built first on fact and dimension tables.

Process of Kimball Methodology Start Schema

  • Facts
  • Dimensions
  • Follows dimensional modeling technique.
  • Enterprise bus matrix is used to document & show how schema is designed.
  • Data marts are built with star schema being core element of dimensional model.
  • Multiple start schema can exist in same model.
  • Conformed data dimensions (shared attributes) are used to create dimensional data warehouse.

The dataset used is the Northwind Dataset from MySQL and here is the ERD Diagram of the dataset.

Image description

Data modelling with BigQuery, dbt and GitHub for version control

First, we need to understand the business requirements.

What is it that we are trying to do?

Northwind traders are export import company who trades around the world for specialty foods.
Their existing architecture is mostly a mix of on-premises & legacy systems, including their MySQL database which is primary operational database.

They are struggling to keep up with reporting requirements and causing database to slow down impacting their day-to-day business.

Northwind traders wants to modernize their data & reporting solutions & move away from on-prem.

Why are we doing it?

Northwind traders wants to modernize their existing infrastructure for.

  • Better Scalability
  • Reduced load on operational system
  • Improved reporting speed
  • Improved data security

How are we going to achieve it?

Northwind traders will migrate their existing database system to google cloud platform (GCP). BigQuery was selected to run OLAP.
Dimensional Data warehouse will be built on BigQuery to support with reporting requirements.

Next, we define the Business Process

Requirements

Sales Overview

Overall sales reports to understand better our customers what is being sold, what sells the most, where and what sales are the least, the goal is to have a general overview of how the business is going.
Sales Agent Tracking

Track sales & performance of each seller agent to adjust commissions, reward high achievers and empower low achievers.
Product Inventory

Understand the current inventory levels, how to improve stock management, what supplies to we have how much is being purchased. This will allow to understand stock management and potentially broker better deals with suppliers.
Customer Reporting

Allow customers to understand their purchase orders, how much and when they are buying, empowering them to make data-driven decisions and utilize the data to join to their sales data.
To get started we need to set up google account — Click here to set up an account.

Select an existing project if you have one or create a new project if you don’t.

Image description

Your project dashboard should look similar to this.

Image description

Now we set up a GitHub repository for our project.

You can create a new GitHub account if you don’t have one or login into your existing account — Click here to visit GitHub page.

This is what my GitHub profile looks like

Image description

Create a new repository and give it any name of your choice mine would be — analytics-engineering-bootcamp.

Image description

keep the project as private for now later on you can make it public, leave every other setting and just click create repository by scrolling down the page.

Image description

Click here to learn how to install Linux on windows — Click here.

If you are a MacBook user, click here to install Linux.

Alternatively, you can take the bootcamp course to learn how to install Linux on your preferred OS. - Click here to take the bootcamp course.

My preferred Terminal as a windows user is windows terminal on Microsoft store.

Image description

My Linux terminal is opened.

Image description

# Create a new folder on your local machine with Command Line optional
# You can use linux to create the folder too
# I am using anlytics-engineering-bootcamp as my folder name
mkdir C:\path\to\your\designated\file\location\anlytics-engineering-bootcamp

# Open the Folder with Ubuntu
cd /mnt/c/path/to/your/folder/analytics-engineering-bootcamp
Enter fullscreen mode Exit fullscreen mode

Copy everything from the (…or create a new repository on the command line) and paste into your terminal.

Image description

Image description

To input the GitHub password, you will need a token.

Image description

Here is how to get your GitHub Token

Click on the far-right icon of your GitHub profile and click settings.

Image description

Image description

Scroll down to developer’s settings.

Image description

Click on any of the token's access but I am using Token classic.

Image description

Image description

Scroll down and select the necessary settings you might need Choose the repo so you can read and write into your repository.

Image description

Your Folder like mine analytics-engineering-bootcamp should look like this containing the ReadMe file in your local machine when opened.

Image description

My next blog post we will be working on

  • Uploading our dataset into BigQuery
  • Bus Matrix for our business process and data dictionary
  • Setting up dbt
  • Building our dbt models
  • Visualizing and creating our report with PowerBI

Top comments (0)