DEV Community

JayZho
JayZho

Posted on • Edited on

An Introduction to Database

Background knowledge

Database - an organised collection of information, normally stored on a computer electronically.

DBMS - database management system, which is a piece of software that sits between the layer of database and users to allow efficient interaction with the database(querying, updating, deleting, etc), using a programming language such as SQL.

RDBMS - relational database management system, a subset of DBMS that uses tables of rows and columns to organise data.

SQL - a programming language that interacts with relational DBMS such as MySQL and PostgreSQL. Note that the SQL used for one RDBMS is not always protable to any other RDBMS without modification. SQL is actually a combination of 4 types of query language:

  • Query language: to query
  • Definition language: to define database schemas
  • Control language: to do things like user permission management
  • Manipulation language: like CRUD

NoSQL - Non-relational Database. Rather than tables, NRDBMS store data in a more flexible way, such as key-value pairs, JSON, etc.

What is an embedded/serverless database?

Many database uses the client-server mechanism, meaning that the DBMS is running as a server and it waits for request over the internet/locally from another software to manipulate the database and sends back the data accordingly.
An embedded database on the other hand, is integrated into the software itself, meaning that the software that wants to interact with the database will have the DBMS as part of its program.
A good example would be MySQL vs. sqlite, both of which are relational DBMS by the way.

Keys in RDBMS

Primary key - unique key for each entry
Composite key - multiple fields together to make up a unique key
Foreign key - a key linking to another table which acts as the primary key in that table
Secondary key - any primary key candidates that are not selected as the primary key. For example, for the table "student", "email", "student_id" and "phone_number" could all be the primary key, but when "student_id" is selected as the primary key, "email" and "phone_number" become the secondary keys for this table.
Partial key - one or more fields of a weak entity, which uniquely identifies the weak entity when given its own er entity(strong entity). For example, for "Building"(strong entity) has many "Room"(weak entity), the attribute "room_number" could be used as partial key of "Room", since a building cannot have more than 1 room with the same room number. However, attribute like "area" cannot be partial key, since a building can have multiple rooms of the same area.

From ER diagram to Database schema

Basically there's 2 types of tables created from an ER diagram:

  1. Entity Table. Student, Employee, etc.
  2. Relation Table. Superviser, Employment, etc.

Strong entity

Mapping an entity to an entity table is straight forward.

Weak entity

Create a separate realtion table. Include all weak entity's attributes plus the owner's primary key.

1 to ... relationship

If "A can have 1 or 0 B", "B can have 0 ... many A", then create foreign key on A's table, pointing to table B. Favor the total participation as this guarantees un-nullable field.

Many to many relationship

Create a separate relation table.

SQL Basics

Create/Delete/Alter table

Create a table called "student":

CREATE TABLE student (
    student_id INT PRIMARY KEY,
    name VARCHAR(20) NOT NULL,
    major VARCHAR(28),
    degree_id INT,
    FOREIGN KEY(degree_id) REFERENCES degree(degree_id) ON DELETE CASCADE,
);

CREATE TABLE degree (
    degree_id INT PRIMARY KEY,
    name VARCHAR(30) NOT NULL,
);
Enter fullscreen mode Exit fullscreen mode

Delete the table "student":

DROP TABLE student;
Enter fullscreen mode Exit fullscreen mode

Drop the column "major":

ALTER TABLE student DROP COLUMN major;
Enter fullscreen mode Exit fullscreen mode

Add a column "gpa", which has 3 digits and 2 of them are decimals:

ALTER TABLE student ADD gpa DECIMAL(3, 2)
Enter fullscreen mode Exit fullscreen mode

Insert/Update/Delete

Insert all 3 fields:

INSERT INTO student VALUES(5211225, "Shane", "COMP");
Enter fullscreen mode Exit fullscreen mode

Insert only the specified fields: note that "name" cannot be null by the constraints defined before.

INSERT INTO student(student_id, name) VALUES(5211225, "Shane");
Enter fullscreen mode Exit fullscreen mode

Update all major "Biology" to "Bio":

UPDATE student
SET major = "Bio"
WHERE major = "Biology"
Enter fullscreen mode Exit fullscreen mode

Delete a specific entry:

DELETE FROM student
WHERE student_id = 5;
Enter fullscreen mode Exit fullscreen mode

Querying

SELECT * FROM student
WHERE major = "Chemistry" OR name IN ("Claire", "Ben")
ORDER BY major
LIMIT 10
Enter fullscreen mode Exit fullscreen mode

Redundency & Normalisation

What is redundency in a database schema?

Redundency could be a problem in the sense that it harms the consistency between updates. On the other hand, it could be benificial since it improves the performance by avoiding some "JOIN"s between tables.
The redundency in a database schema and its inconsistency problem has similarity to some bad code: Say you have some code that depends on another computation "plus two", and you use this dependent computation twice in this code:

// first time
a = a + 2;

// some other code ...

// second time
a = a + 2;
Enter fullscreen mode Exit fullscreen mode

Now if you decide to change the "plus two" computation to "times two", you have to change everywhere that you wrote this code. (twice in this case). This can easily lead to inconsistency problem.
A better and fairly obviously practice is to write the "plus two" logic as a separate function and calls this function when needed:

function dependentComputation(x) {
    return x + 2
}
// use the logic
a = dependentComputation(a)

// use it again somewhere else
a = dependentComputation(a)
Enter fullscreen mode Exit fullscreen mode

Later on when you have to change the computation, simply change it in the function and it automatically updates all the places it was refered to.

This is similar to how we get rid of the redundency in a database schema. Say two entries share the same third entity "account" in a table:

order_id | order_total | account_id | account_balance
0001       18.00         8888         20.00
0002       12.00         8888         20.00
Enter fullscreen mode Exit fullscreen mode

When a new order is placed and added to this table, we have to update all other entries related to account "8888", otherwise inconsistent account balance would occur:

order_id | order_total | account_id | account_balance
0001       18.00         8888         20.00  <- INCORRECT BALANCE
0002       12.00         8888         20.00  <- INCORRECT BALANCE
0003       6.00          8888         14.00  <- ACTUAL BALANCE
Enter fullscreen mode Exit fullscreen mode

Now just like how we extracted the computation logic into a separate function in the previous example, we now extract the "account related" fields into a separate table, and just add a foreign key pointing to the account table in each entry in the "order" table:

ORDER TABLE:
order_id | order_total | account_id
0001       18.00         8888         
0002       12.00         8888         
0003       6.00          8888         

ACCOUNT TABLE:
acc_id   | acc_balance
8888       14.00
Enter fullscreen mode Exit fullscreen mode

Functional Dependency

For a table with attributes A, B, C, D:

A  | B  | C  | D
a1   b1   c1   d1
a2   b2   c2   d2
Enter fullscreen mode Exit fullscreen mode

If knowing an attribute X determines the uniqueness of another attribute Y, then we say X determines Y, or Y is functionally dependent on X, denoted as X -> Y.
If the combination of some attributes is unique, say A and B together determines C or D, then we say AB -> CD.
Intuitively, a primary key or secondary key determines any other attributes.

Dependency Closure

The dependency closure of an attribute X in the table ABCD... is the set of attributes {ABCD...} that are derivable from X.

Normal Form

BCNF (Boyce Codd Normal Form):
If any functional dependency "X -> Y" in the schema satisfies that "X contains the whole key", then the schema is in BCNF. Notice that if X only contains a candidate key but not the primary key, it's still in BCNF.

The decomposition step to achieve BCNF regarding a functional dependency "X -> Y" in the table S is fairly straight forward:

  1. Remove Y from table S
  2. Create a new table "XY".

Top comments (0)