DEV Community

John Wakaba
John Wakaba

Posted on

SQL 101: INTRODUCTION TO SQL FOR DATA ANALYTICS

Understanding SQL (Structured Query Language) has become essential in the rapidly changing field of data analytics. SQL, the most crucial component of data management and analysis as it enables data analysts to effectively query and manage large datasets.

In this article you will learn about sql, why sql is important for data analytics, the core sql concepts, the basic sql functions for data analytics, joining tables and sql in data analytics tools.

Introduction to SQL

SQL, or Structured Query Language, is a programming language that is used to manage data in relational databases, subsequently it is now the most commonly employed method for data access. IBM developed SQL during the 1970s. SQL can create, edit, delete, and retrieve data from databases such as PostgreSQL, Oracle, and MySQL by running queries.

Need For SQL In Data Analytics

While engineers often use SQL in software development, data analysts also prefer it for several key reasons:

  • Easy to Learn and Understand: SQL’s simple syntax makes it accessible, even for those without a programming background.

  • Direct Data Access: Analysts can query large datasets directly from their source without needing to export data into other applications, allowing for faster and more efficient analysis.

  • Transparency and Reproducibility: SQL queries provide a clear, auditable process, making it easier to review and replicate analyses compared to using spreadsheet tools.

The kinds of aggregations you might typically accomplish in an Excel pivot table—sums, counts, minimums and maximums, etc.—are easier to conduct with SQL, moreover it can also handle far larger datasets and numerous tables at once.

Core SQL Concepts

In SQL, data is organized into databases, which function as information storage and management containers. One or more tables, or structured data collections, are present in every database.

Tables are composed of rows and columns. Each column holds a specific type of data (e.g., text, numbers, dates), and each row represents a unique record in the table.

For example, in a Customer table, columns might include fields such as customer_id, name, email, and signup_date, while each row represents an individual customer and their details.

This table structure makes it easy to organize and retrieve large volumes of data efficiently.

FUNCTION STATEMENT/CLAUSE/FUNCTION
Filtering Data WHERE Clause
Aggregating Data COUNT, SUM, MAX, MIN, AVG
Grouping Data Group By
Sorting Data Order By

Basic SQL Syntax

Select statement

Data is fetched or retrieved from a database using the SQL SELECT command. Users can retrieve particular data depending on predetermined criteria and gain access to the data. The full table can be retrieved, or we can retrieve it based on predefined criteria.

select * from sales;
-- Lists all the rows within the table one after another
select SaleDate, Amount, Customers from sales;
-- Selecting specific columns
Enter fullscreen mode Exit fullscreen mode

Filtering Data

Where Clause

In a result set, the WHERE keyword is used to retrieve filtered data. It is employed to get data based on specific standards. Data can also be filtered using the WHERE keyword by matching criteria.

-- Selecting where the amount is greater than 10,000
select * from sales
where amount > 10000
;

-- Using and
select * from sales 
where amount > 10000 and SaleDate >='2022-01-01'
;
Enter fullscreen mode Exit fullscreen mode

Aggregating Data

Count Function

For a given criteria, the COUNT() function returns the total number of rows that match. Understanding the volume of data entries and seeing trends based on countable indicators are two areas in which this function comes in handy.

--- To count the number of sales
SELECT COUNT(*) AS total_sales
FROM sales;

Enter fullscreen mode Exit fullscreen mode

Sum Function

The total sum of a numeric column can be obtained using the SUM() method. For computing totals like sales, revenue, or any other cumulative numerical value, this function is ideal.

--- To calculate the total number of products sold
SELECT SUM(quantity) AS total_products_sold
FROM sales;

Enter fullscreen mode Exit fullscreen mode

Avg Function

Assisting you in identifying major patterns in your data, the AVG() function returns the average value of a numeric column. When figuring out the average of a group of numbers, such as wages, costs, or scores, this is helpful.

--- To find the average price of items sold.
SELECT AVG(price) AS average_price
FROM sales;

Enter fullscreen mode Exit fullscreen mode

MIN() and MAX() Functions

The aggregate methods in SQL, MIN() and MAX(), work on a set of data and return a single output.

The minimum value of the specified columns is returned by the SQL MIN() method, while the maximum value of the selected columns is returned by the SQL MAX() function.

--- To find the lowest and highest price of items sold
SELECT MIN(price) AS lowest_price
FROM sales;
SELECT MAX(price) AS highest_price
FROM sales;

Enter fullscreen mode Exit fullscreen mode

Grouping Data

GROUP BY

The SQL GROUP BY statement is used to group rows that have the same values in specified columns. It is commonly used in combination with aggregate functions (e.g., COUNT, SUM, AVG) to perform calculations on each group of data.

For example, it groups rows based on a column's value if multiple rows contain the same value in that column.

SELECT gender
FROM employee_demographics
GROUP BY gender
;
-- The GROUP BY clause is rolling up all these values into the gender rows
Enter fullscreen mode Exit fullscreen mode

Joining Tables

Another useful concept is the JOIN, which allows you to combine data from multiple tables.

The SQL JOIN clause is used to retrieve and access data from multiple tables by defining logical connections between them. It allows you to pull information from several tables at once by matching key values shared between those tables.

You can perform a JOIN with multiple tables, and it can be combined with other clauses. A common approach is to use JOIN together with the WHERE clause to filter the data being retrieved.

-- We are joining the employee demographics table onto the employee salary table
SELECT * FROM employee_demographics
INNER JOIN employee_salary
    ON employee_demographics.employee_id = employee_salary.employee_id
;
Enter fullscreen mode Exit fullscreen mode

SQL In Data Analytics Tools

SQL plays a crucial role in popular data analytics tools like Power BI and Tableau by enabling users to connect, query, and manipulate large datasets directly within these platforms. Here’s how SQL is commonly used in these tools:

Power BI

  • SQL Queries for Data Import

  • Custom Data Modeling

  • Data Transformation

Tableau

  • SQL for Data Connections

  • Custom SQL for Data Source Preparation

  • Data Blending and Joins

    Final Thoughts

With SQL skills, you are well-prepared to conduct comprehensive data analysis and extract meaningful insights from your data. Whether handling small datasets or large data warehouses, SQL serves as a solid foundation for efficient data management and analysis.

Top comments (0)