Before understanding what SQL is, we must first understand the concept of data and data storage.
In today's world, data is everywhere—from the emails we send to the transactions we make, everything generates data. Data is simply a collection of facts or information, and it can take many forms: text, numbers, images, videos, or even signals.
To make sense of this vast amount of information, data needs to be organized and stored in a structured way. This is where databases come in.
Understanding Databases
A database is a collection of data that is stored in a form that allows for easy storage, retrieval, and management.
There exist two forms of databases: Relational databases and Non-Relational Databases.
- Relational Databases are also called SQL databases. These are databases that store data in structured, predefined tables with rows and columns.
- Non-Relational Databases are also called No-SQL databases. These are databases that store data without a predefined schema. These databases are designed to handle unstructured, semi-structured, or rapidly changing data.
To manage data in a database, we use database management software.
This is where we can begin understanding what SQL is.
SQL stands for Structured Query Language. It is the standard language used to communicate with relational databases. It allows users to perform operations such as querying, updating, and managing data stored in tables contained in databases.
A key thing to note is that SQL is a language, and like any language, it has various dialects. There exists a base language (ANSI standard) and then different flavors of SQL are determined by the Relational Database Management Software you use (RDBMS).
The most common RDBMS today are: Oracle, MySQL, Microsoft SQL Server, and PostgreSQL (in order of popularity as of June 2024).
The summary is that there exist tables that store data, and those tables are housed within databases, and those databases are managed using a relational database management software.
SQL Syntax
One of the best things about SQL is that it is a procedural language, meaning that it feels like writing human instructions. An example would be:
SELECT column_a
FROM table_a;
Even if one didn't have technical computer knowledge, they would be able to understand what this snippet of code was written to do, which is to return column_a
that exists within table_a
.
Something to note is that key SQL statements are not case-sensitive, meaning the code we used above could still be correct if written as:
select column_a
from table_a;
However, it is standard practice to capitalize key SQL statements like SELECT
and FROM
, not only to enhance neatness and readability, but also to ensure you can easily spot errors in your code.
Let's Get Down to Writing SQL Code!
The first statement you will encounter is SELECT. It is used to select what you specify it to select from a table.
While you can select one column like what we did above, you can also select multiple columns:
SELECT
column_a,
column_b,
column_c
FROM table_a;
Two things to note here:
- In one's head, one might be wondering why we indented the code. It is just to enhance neatness. The indent is not part of SQL syntax.
- Also, you might have noticed that the code always ends with a semicolon. This is because this is how the RDBMS knows where to terminate the 'procedure' or SQL code.
FROM is used to specify which table the data should come from.
Aliases - Let's say you want some columns to have a specific name when the results are returned after the query is run. This is where you would use aliases. The key word for alias is AS:
SELECT
column_a AS a,
column_b AS b,
column_c AS c
FROM table_a;
Filtering Results
Imagine you want to filter your results. This is where you would use the magic SQL keywords responsible for filtering: WHERE, LIKE, HAVING, and REGEXP.
We shall only cover WHERE, though I will give you some insights on the rest.
- LIKE and REGEXP are related. REGEXP is a more advanced version of LIKE used to filter data using specific criteria, for example, giving records where values in a certain column end with 'th'.
- HAVING, on the other hand, is another version of WHERE, but with more specific criteria.HAVING for example can take GROUP BY queries and filter them while WHERE cannot
SELECT
column_a AS a,
column_b AS b,
column_c AS c
FROM table_a
WHERE column_a < 10;
SELECT
CustomerID,
COUNT(SaleID) AS SalesCount
FROM Sales
GROUP BY CustomerID
HAVING SalesCount > 5;
If you read the SQL code like human instructions, you get an idea of what it is trying to do:
We want to select column_a
, column_b
, and column_c
, each with their own alias, and we are selecting from table_a
. From those records we have selected, we want only those where the values in column_a
are less than 10.
Creating Tables and Databases
To create anything, we use the keyword CREATE.
We can create tables and databases.
The syntax for this is:
CREATE database_name;
CREATE table_name;
Whenever we create a table, we have to think about what data is going into the table and what data types each field will take on.
There are various data types, but we shall first divide them into number data types, character data types, and datetime data types.
Number data types include
INT
,AUTO_INCREMENT
,FLOAT
. The key thing to note is thatINT
represents integers. While we have subdivisions ofINT
, which includeBIGINT
andSMALLINT
, they only represent the range of values that can be taken but not the integer aspect.AUTO_INCREMENT
is an integer data type that increases every time a record is added to a table. It is at times used as a surrogate primary key in tables.Character data types include
CHAR
,VARCHAR
, andTEXT
.CHAR
andVARCHAR
have the same characteristics, but there is one big difference.CHAR
takes on the full specification of characters given, butVARCHAR
does not.
For example, if I specify a field to take onVARCHAR(10)
and another to takeCHAR(10)
, meaning each to take 10 characters, and then I input the string 'hello' in both fields, when I do a count of the characters in theVARCHAR
, it returns 5, but for theCHAR
, it returns 10. Basically,CHAR
will add white space to the remaining spaces to fill the limit given, which is 10 characters. That is also whyVARCHAR
is named as it is, which stands for Variable Character.Datetime data types include
DATE
,TIME
, andDATETIME
. You may ask why I haven't added the rest likeINTERVAL
andYEAR
. This is because they are not part of the ANSI standard of SQL and vary across RDBMS.DATE
represents dates,TIME
represents time, andDATETIME
is a combination ofDATE
andTIME
.
Here is a snippet of code putting all these things together:
CREATE TABLE teachers (
id BIGINT AUTO_INCREMENT PRIMARY KEY, -- This is autoincremental data type
first_name VARCHAR(25),
last_name VARCHAR(50),
school CHAR(50),
hire_date DATETIME,
salary FLOAT
);
INSERT INTO teachers (first_name, last_name, school, hire_date, salary)
VALUES ('Ken', 'Hubert', "Murang'a", '2020-01-01 15:49:20', 65000);
The LIMIT Clause
This is one of the most important clauses when querying any table. Imagine this: You work at Instagram and have access to their SQL databases (assuming they have those). Imagine querying a table of all Instagram users and you want to filter all the records where the age of the user is below 20. Logically, that could be in the tens of millions, if not hundreds. Imagine hitting run and just crashing the system because you've requested more than the system can handle. This is how the LIMIT clause could help you. It helps retrieve a subset of the records from that query, hence avoiding disaster!
SELECT
name,
age
FROM ig_users
WHERE age < 20
LIMIT 5;
Here's your article with the requested formatting in markdown:
Joins in Relational Databases
One of the main aims of SQL databases is to reduce redundancy when storing data in tables. This will involve storing data in multiple tables that have relationships with each other. When querying the tables to get data, we might need to join tables to get the results we want.
There are a few types of JOINS:
- JOIN or INNER JOIN brings columns that are matching in both tables.
- LEFT JOIN brings every row in the LEFT Table and it finds matching rows in the RIGHT table.
- RIGHT JOIN brings every row in the RIGHT Table and it finds matching rows in the LEFT table. Remember, a right join is just a left join with the tables inverted.
- FULL OUTER JOIN
- CROSS JOIN - Returns every possible combination of rows from both tables.
An example that will show the differences between these joins can be learnt by implementing the code snippet below:
CREATE TABLE schools_left (
id INT,
left_school VARCHAR(30),
PRIMARY KEY (id)
);
CREATE TABLE schools_right (
id INT,
right_school VARCHAR(30),
PRIMARY KEY (id)
);
INSERT INTO schools_left (id, left_school)
VALUES
(1, 'Oak Street School'),
(2, 'Roosevelt High School'),
(5, 'Washington Middle School'),
(6, 'Jefferson High School');
INSERT INTO schools_right (id, right_school)
VALUES
(1, 'Oak Street School'),
(2, 'Roosevelt High School'),
(3, 'Morrison Elementary'),
(4, 'Chase Magnet Academy'),
(6, 'Jefferson High School');
INNER JOIN
SELECT *
FROM schools_left sl
JOIN schools_right sr
ON sl.id = sr.id;
LEFT AND RIGHT JOIN
SELECT *
FROM schools_left sl
LEFT JOIN schools_right sr
ON sl.id = sr.id;
SELECT *
FROM schools_left sl
RIGHT JOIN schools_right sr
ON sl.id = sr.id;
You’d use either of these join types in a few circumstances:
- You want your query results to contain all the rows from one of the tables.
- You want to look for missing values in one of the tables; for example, when you’re comparing data about an entity representing two different time periods.
- When you know some rows in a joined table won’t have matching values.
FULL OUTER JOIN
SELECT *
FROM schools_left sl
FULL OUTER JOIN schools_right sr
ON sl.id = sr.id; -- FULL OUTER JOINS AREN'T PART OF MySQL But rather found in PostgreSQL
CROSS JOIN
SELECT *
FROM schools_left sl
CROSS JOIN schools_right sr;
A good way to find missing records is using the NULL operator combining the JOINS.
An example is finding columns that didn't have a match after a LEFT JOIN:
SELECT *
FROM schools_left sl
LEFT JOIN schools_right sr
ON sl.id = sr.id
WHERE sr.right_school IS NULL;
With this introduction, one should be able to play around with creating databases, tables, inputting data into them, and querying the data as a whole and also filtering the queried results. You should also be able to experiment with various joins and practice how to retrieve data stored in different tables.
Also, while playing around, you might encounter runtime errors which are very useful in guiding you to understand what not to do while writing SQL queries and also give you a better understanding of the language itself.
Thank you for reading!
Top comments (0)