What is SQL?
SQL is used to access, manipulate, and retrieve data from objects in a database. SQL databases are often called relational databases because they are made up of relations which are commonly referred to as tables.
It's the language used to communicate with databases.
More specifically, it should be clear that SQL tables can be thought of as datasets, rows can be considered as individual units of observation, and columns can be considered as features.
A data model or a schema describes how database objects are organized within a database.
Normalization is the process of organizing data in a database.
It involves creating tables and establishing relationships between those tables to make the database more flexible by eliminating redundant records or data points,
Parent: An entity or attribute in an entity that has child records, values or dependencies.
Child: An entity or attribute in an entity that depends on a parent record value.
Basic Data Types of SQL.
There are three main character data types.
1.VARCHAR(Variable character)
This is the most popular string data type. If the data type is VARCHAR(50),then the column will allow up to 50 characters.The maximum length for varchar is 8000 characters.
2.CHAR (character)
If the data type is CHAR(5), then each value in the column will have exactly 5 characters. In other words, the string length is fixed. Data will be right-padded with spaces to be exactly the length specified. For example, 'hi' would be stored as 'hi '.
The maximum length for this data type is 8000 characters.
3.TEXT
Unlike VARCHAR and CHAR, TEXT requires no inputs, meaning you do not have to specify a length for the text. It is useful for storing long strings, like a paragraph or more of text.
Types of Keys.
Primary key - the value which uniquely identifies every row -ie it Uniquely identifies a single row in the table, It is created from a single column/group of columns. A primary key can be made up of one or more columns in a table. They are Flagged as pk in data model or with a key icon.
Features of Primary Key:
- It holds unique values for each record/row.
- It can't have null values.
- Primary key should be immutable meaning they can't be changed. eg >Customers table - "Customer ID" as a primary key.
Foreign Key - used in related tables to connect the data back to the primary key in the primary table. It refers to a primary key in another table in other terms.
They create a link between data in one table(the referencing table) and another table that holds the primary key values (the referenced table)
Table that has foreign key is called the referencing table or child table.
Keywords and functions.
keyword is a text that already has some meaning in SQL.
These are words defined in SQL standard that are used to construct SQL statement. Keywords are typically capitalized for readability. However, SQL is case-insensitive, meaning that an uppercase_ WHERE_ and a lowercase_ where _mean the same thing when the code is run.
Function a special type of keyword. They allow commonly used set of calculations or procedures to be stored and easily referenced in queries.
Identifier is the name of a database object such as a table or a column.
Statements and Clauses.
These are ways to refer to subsets of SQL code.
Statement - A statement starts with a keyword and ends with a semicolon.
Clause - A clause is a way to refer to a particular section of a statement.
Quotes. Two types of quotes in SQL include single quotes and double quotes.
Single Quotes are used to represent Strings,
Double Quotes used to represent Identifiers
Sublanguages
- What does query a database means? To query a database means to retrieve data from a database, typically from a table or multiple tables.
- What is a data model? A data model or a schema describes how database objects are organized within a database. A subquery is a query that is nested inside another query.
- What is a view? View is a virtual table whose contents are defined by a query. Like a table, a view consists of a set of named columns and rows of data.
Choose a DBMS
DBMSs fall into two categories:
- Client -Server: MariaDB, Microsoft SQL Server, MySQL, Oracle, PostgreSQL
- Shared-File: Access, FileMaker.
Memorize this order
SELECT -- columns to display
FROM -- table(s) to pull from
WHERE -- filter rows
GROUP BY -- split rows into groups
Having -- filter grouped rows
ORDER BY -- columns to sort
Classic mnemonic to remember the order of the clauses is
Sweaty feet will give horrible odors.
Start Fridays with grandma's homemade oatmeal.
Retrieving Data.
SELECT.-"certain columns or data attributes within tables"
This is the most basic SQL query.
Select is used to select(retrieve) data from a database table.
SELECT requires two pieces of information
- What to SELECT.
- Where to SELECT it from.
The FROM Clause.- "certain tables in the database"
The from clause is used to specify the source of the data you want to retrieve.
Retrieving data from more than one table u can use the JOIN clause within the FROM clause and displays it as a single result.
The WHERE clause.-"certain conditions are met to retrieve the desired data"
The WHERE clause is used to restrict query results to only rows of interest or simply put it is the place to filter data.
Rarely will you want to display all rows from a table,but rather rows that match specific criteria.
Basically where is used for filtering for records based on a certain criteria.
SELECT * FROM station_data
WHERE year != 2020
Remember most databases typically use the YYYY-MM-DD format for dates, which is an international standard (ISO 8601) and avoids ambiguity.
SELECT description,
FROM
WHERE description = '2023-12-24'
SELECT *
FROM my_table
WHERE year_id = 2021;
Note, When using text you must wrap literals (or text values you specify) in single quotes.
Null values cannot be determined with an =.
You need to use the IS NULL or IS NOT NULL statements to identify null values.
SELECT *
FROM grid -- Return rows where demand_loss_mw is not missing or unknown
WHERE demand_loss_mw IS NOT NULL;
A more elegant way of handling null values is to use the coalesce() function, which will turn a possibly null value into a specified default values if it is null.
Where clause is used to check a list, - CHecking a list for instance
SELECT name,populaiton FROM world
WHERE name IN ('Sweden','Norway','Denmark')
QUESTION 1.
In the products table, we’d like to see the ProductID and ProductName for those products where the ProductName includes the string “queso”
SO here we will use the LIKE which is used to search for a specified pattern in a column.
SELECT
ProductID,
ProductName
WHERE
ProductName LIKE '%queso%';
When using more than one AND or OR condition, you need to use parentheses to separate and position pieces of logic together.
SELECT *
FROM products
WHER (year>2016 AND year<2018)
OR product_type ='scooter';
The Group BY clause.
Main purpose of the** GROUP BY** clause is to collect rows into groups and summarize the rows within the groups in some way, ultimately returning just one row per group.
These are the steps you should take when using a GROUP BY:
Figure out what column(s) you want to use to separate out, or group, your data (i.e., tour name).
Figure out how you’d like to summarize the data within each group (i.e. count the waterfalls within each tour).
When you’ve decided on those:
In the SELECT clause, list the column(s) you want to group by (i.e., tour name) and the aggregation(s) you want to calculate within each group (i.e., count of waterfalls).
In the GROUP BY clause, list all columns that are not aggregations (i.e., tour name).
EXample 1
To display the number of countries per continent, you would typically use the GROUP BY clause to group the data by continent and then apply the COUNT() aggregate function to count the number of countries in each continent.
SELECT continent,COUNT(*) AS num_countries
FROM world
GROUP BY continent
QUESTION 2
Show a list of countries where the Northwind company has customers.
SELECT country
FROM Customers
GROUP BY country
THE HAVING Clause.
The HAVING clause places restrictions on the rows returned from a GROUP BY query. It is used to filter the results of a GROUP BY query based on aggregate functions (like SUM, COUNT, AVG, etc.).
When to Use the HAVING Clause
1.Filtering Based on Aggregates: Use HAVING when you need to apply conditions to aggregated results. For instance, you might want to display only groups where the total revenue is above a certain threshold.
2.A Having clause always immediately follows a GROUP BY clause. Without a GROUP BY clause, there can be no HAVING clause.
EXAMPLE 2.
To Display,output or list the continents that have a total population of at least 100 million, use the HAVING clause which is used with the aggregate functions or columns within the GROUP BY clause.
SELECT continent
FROM world
GROUP BY continent
HAVING SUM(population) >=100000000;
QUESTION - LIST DEPARTMENTS WITH TOTAL SALARY EXCEEDING 200,000.
SELECT Department.DepartmentName AS Dept,SUM(Salary) AS TOTAL_Salary
FROM Employee
JOIN Department
ON Employee.DepartmentID = Department.DepartmentID
GROUP BY Department.DepartmentName
HAVING SUM(Salary)>200000;
ORDER BY Clause.
The ORDER BY clause is used to specify how you want the results of a query to be sorted.
To specify sort order, add order By to your SELECT statement.
ORDER BY requires one piece of information
- The column to sort by.
- Optionally you may also specify multiple sort columns and sort directions. Also instead of writing the name of the column you want to order by, you can refer to the position number of that column in the query's SELECT clause.
SELECT product_id,model
FROM products
ORDER BY 1
** Sort by one column**,
SELECT prod_name
FROM products
ORDER BY prod_name;
The Distinct.
When looking through a dataset, you may be interested in determining the unique values in a column or group of columns. That's the primary use case of the DISTINCT keyword.
Filtering Data.
We use the WHERE clause for filtering data. WHERE must be provided with a filter condition, Remember to enclose within single quotes.
SELECT vend_id, prod_name
FROM Products
WHERE vend_id != 'DLL01';
Filter Using A Range
SELECT prod_name,prod_price
FROM Products
WHERE prod_price BETWEEN 5 AND 10;
Combine WHERE Clause.
SELECT prod_id,prod_price,prod_name
FROM Products
WHERE vend_id = 'DLL01' AND prod_price <=4;
Order of Evaluation.
The order of evaluation is
(), AND, OR, Use parentheses to force a higher level of evaluation.
NOT IN.
The NOT IN operator serves as a powerful means to filter data sets, allowing users to exclude rows that match a specified list of values.
The NOT IN operator is used within a WHERE clause to exclude rows where a specified column's value matches any given list of values.
Basic Syntax
SELECT Column names
FROM table_name
WHERE Column_name NOT IN (Value1, value2,....)
Filter using Like.
Use LIKE to search using a wildcard, % to match zero or more characters, _to match a single character, [] to match one of a list of specified characters, ^ to negate the match, [] is not widely supported.
Aggregate functions
An aggregate function performs a calculation on many rows of data and results in a single value.
Basic aggregate functions in SQL include
Aggregate functions apply calculations to non-null values in a column, only COUNT(*) which counts all rows including null values.
S
A code to show how many countires have an area of at least 1000000
SELECT COUNT(area) AS Total_area
FROM world
WHERE area >= 1000000;
The LEAST _and _GREATEST functions find the smallest and largest values within a row.
SELECT SUM(gdp) AS total_gdp
FROM world
WHERE continent = 'Africa'
ORDER BY total_gdp
QUESTION.
What is the total population of ('Estonia', 'Latvia', 'Lithuania')
SELECT SUM(population) AS Total_Population
FROM world
WHERE name IN ('Estonia', 'Latvia', 'Lithuania');
The HAVING Statement.
Unions
Union
The Union keyword combines the results of two or more SELECT statements into one output .
Union Operators.
Use the Union Keyword to combine the results of two or more SELECT statements. The difference between a JOIN and a UNION is that JOIN links together multiple tables within a single query whereas UNION stacks the results of multiple queries:
Unions are used to combine one or more
Unions.
Unions are used to combine one or more queries.
Each Query must be separated by UNION(or Union ALL to include duplicates)
If used ORDER BY must be specified after the final query.
UNION RULES.
All que
When using union only one by ORDER can be used.
CTE - Common Table Expression.
It is a temporary result set that you can reference within a select, insert, update or delete statement. They are defined using the WITH keyword.While common table expressions operate similarly to subqueries,they have several benefits inlcuding:
- The ability to reference the same temporary result set repeatedly across the query.
- Improved readability for collaboration and debugging.
- Better visibility into commonly used result sets that are good candidates to become permanenr tables /views .
WITH cte_name AS (
SELECT column 1,Column2
FROM table_name
WHERE condition
)
SELECT *
FROM cte_name;
**
Managing Data.**
INSERT.
In a relational database, data only exists if the database first receives records. The INSERT statement does just and inserts a record into the database.
INSERT INTO ATTENDEE (FIRST_NAME,LAST_NAME,PHONE,EMAIL,VIP)
VALUES
('Jon','Skeeter',4802185842,'john.skeeter@rex.net',1),
('Sam','Scala',2156783401,'sam.scala@gmail.com', 0),
('Brittany','Fisher',5932857296,'brittany.fisher@outlook.com', 0)
Delete
The DELETE statement is dangerously simple. It deletes all records in a table:
DELETE FROM ATTENDEE.
WHERE PHONE IS NULL
AND EMAIL IS NULL
Truncate Table.
TRUNCATE when you want to quickly remove all records from a table, reset identity columns, and don’t need to log each row’s removal or use WHERE.
Update.
UPDATE ATTENDEE SET EMAIL = UPPER(EMAIL)
The UPDATE modifies existing records
Drop Table.
Used to remove a table altogether from the database.Just type DROP TABLE followed by the name of the table you want to delete.
DROP TABLE MY_UNWANTED_TABLE
Why SQL Is a Good Choice for Text Analysis.
Good reasons to use SQL for text Analysis ;
SQL is less error-prone than spreadsheets since no copying and pasting is required and the original data stays intact. Data could be altered with an UPDATE command but this is hard to do accidentally.
SQL is good at cleaning and structuring text fields. Cleaning includes removing extra characters or whitespace, fixing capitalization, and standardizing spellings.
Text Parsing
Parsing data with SQL is extracting pieces of a text value to make them more useful for analysis. Parsing splits the data into the part we want and everything else though typically our code returns only the part we want.
Text Transformations.
Transformations change string values in some way. Among the most common transformations are the ones that change capitalization. The Upper function converts all letters to their uppercase form, while the lower function converts all letters to their lowercase form
Top comments (0)