In SQL (Structured Query Language), data validation refers to the process used to confirm the accuracy and reliability of data. Because databases are often updated, deleted, queried, or migrated by multiple users or programs, it is critical to maintain the integrity of the data. In the following content, we will introduce how to implement some basic validation rules in SQL to ensure the quality of data.
What is data validation in SQL?
Data validation is a means of ensuring data accuracy and reliability, usually before entering, modifying or processing data. When data from multiple sources needs to be integrated, we often talk about "data cleaning", which actually refers to the data validation process. In this process, we can confirm that the data:
- Is it complete (that is, there is no missing or blank information)
- Is it unique (that is, there are no duplicate data items)
- Is it in accordance with the expected standard or pattern (such as whether the value is within a specific range)
Some examples of applying these checks are as follows:
- For a payroll table, the "Salary" field can be set with a minimum and maximum limit to ensure that the entered salary amount is within a reasonable range.
- In the customer information table, when recording address details, you can set rules to ensure that the postal code contains only numbers and its length is in a standard format.
- In the product information table, the "Color" column can be set with an enumeration type constraint so that it can only accept predefined color options.
Some people might think that "this type of data validation should be implemented at the application level, not at the database level." While application-level validation is important, it is also important to ensure that the database itself has appropriate data validation mechanisms in place, even if validation is also performed at the application level, given the situation where updates are performed directly at the database level.
Although it sounds obvious, it is worth noting that although we can validate data input through rules, this does not completely guarantee the true correctness of the values. For example, we can set a rule that all entries in the salary field must be between 5000 and 250,000. However, this rule does not prevent incorrect entries as long as they fall within this allowed range.
During the validation process, we can apply more complex constraints. For example, we can use foreign key constraints to ensure that the relationship between two tables is not broken, set default values if data is not provided, and uniquely identify each row in a table through primary key constraints.
We will specifically explore the three most commonly used constraints for data validation: NOT NULL, UNIQUE, and CHECK. These constraints help us maintain data consistency and accuracy at the database level.
Constraints in SQL
In SQL Server, constraints are rules that restrict the data that can be inserted into a table. These rules help ensure the validity of the data and support the overall integrity of the database. Constraints can be defined at the same time as or after the table is created, and can apply to a single column or multiple columns.
When you try to insert data into a column that meets the constraints, SQL Server will allow the operation and successfully insert the data. Conversely, if the inserted data violates any of the constraints, the insert operation will fail and the system will return an error message.
Not NULL Constraint
SQL Server supports NULL values, which represent an "unknown" or "no value" state. NULL values have their use cases, but in some cases, we do not allow NULL values in a column. In this case, we can apply a NOT NULL
constraint on the column.
Let's use a different example to illustrate this: suppose we want to create a customer information table and specify that all fields except the "nickname" field cannot have null values.
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
FirstName NVARCHAR(50) NOT NULL,
LastName NVARCHAR(50) NOT NULL,
Nickname NVARCHAR(50),
Email NVARCHAR(100) NOT NULL,
PhoneNumber NVARCHAR(15) NOT NULL
);
If we need to update an existing column to NOT NULL, then we can use the ALTER TABLE statement. Alternatively, we can use Chat2DB β right-click on the table and click Modify Table to make the changes.
Unique constraint
We usually use UNIQUE constraint on ID type columns. For example, create a book table and specify that the ISBN
of each book must be unique and cannot be empty:
CREATE TABLE Books (
ISBN VARCHAR(13) NOT NULL UNIQUE,
Title NVARCHAR(100),
Author NVARCHAR(100),
PublicationYear INT
);
In this example, the ISBN
field is set to UNIQUE and NOT NULL, ensuring that each book has a unique International Standard Book Number and that the number cannot be left blank.
Check constraints
A check constraint consists of a logical expression that determines which values are valid. A simple example is in a payroll database where we want to specify the maximum value that can be entered. The syntax for a CHECK constraint when creating a table is as follows.
CREATE TABLE table_name
(
column1 datatype [ NULL | NOT NULL ],
column2 datatype [ NULL | NOT NULL ],
...
CONSTRAINT constraint_name
CHECK (column_name condition)
);
Therefore, we can create a table of employee salaries and impose a check constraint on the values entered into the Salary
column:
CREATE TABLE dbo.EmployeeSalaries (
EmployeeID int PRIMARY KEY,
EmployeeType int,
Salary decimal(9,2),
CONSTRAINT CK_EmployeeSalaries_SalaryRange
CHECK (EmployeeType = 1 AND Salary >= 0 AND Salary <= 200000.00)
);
If we need to add a check constraint to a column in an existing table, we can use the ALTER
statement:
ALTER TABLE dbo.EmployeeSalaries
ADD CONSTRAINT CK_EmployeeSalaries_SalaryRange
CHECK (EmployeeType = 1 AND Salary >= 0 AND Salary <= 200000.00);
To drop a check constraint, we can use the following command:
ALTER TABLE dbo.EmployeeSalaries
DROP CONSTRAINT CK_EmployeeSalaries_SalaryRange;
Finally, it is often useful to temporarily enable or disable a check constraint, which we can do as follows:
To enable a check constraint:
ALTER TABLE dbo.EmployeeSalaries
WITH CHECK CHECK CONSTRAINT CK_EmployeeSalaries_SalaryRange;
To disable a check constraint:
ALTER TABLE dbo.EmployeeSalaries
NOcheck CONSTRAINT CK_EmployeeSalaries_SalaryRange;
As you can see, check constraints are easy to create and flexible to use.
Summary
Performing data validation in SQL is essential to maintaining the integrity and consistency of your database. Using SQL Server Management Studio (SSMS), you can implement data validation by creating and managing constraints. This includes using NOT NULL constraints to prevent null values from being inserted, using UNIQUE constraints to ensure uniqueness of a column or combination of columns, and using CHECK constraints to restrict the range or format of values in a column. Through these methods, you can effectively control data quality at the database level.
Community
Go to Chat2DB website
π Join the Chat2DB Community
π¦ Follow us on X
π Find us on Discord
Top comments (0)