SQLFlash

Posted on Mar 7

How Implicit Conversions Can Kill Database Performance

#sqloptimization #sqltuning #databaseperformance #implicitconversions

Database query performance issues have always been a core challenge for DBAs. In some cases, inefficient queries not only affect individual operations but can also trigger instance-level performance meltdowns. That's why every seasoned DBA must master the methodology of query performance analysis and troubleshooting.

Although there are many factors that affect query performance, the key to solving problems always lies in deeply interpreting the SQL execution plan. The execution plan shows us the sequence of steps for SQL execution (such as physical operators and logical operation order), allowing us to precisely locate performance issues.

In this article, we will focus on a commonly overlooked but extremely harmful performance killer — implicit conversions.

1. What Are Implicit Conversions?

In layman's terms, when an operator is used with operands of different types, type conversion occurs to make the operands compatible, and this is when implicit conversion happens. The presence of implicit conversion often means that the execution efficiency of SQL will be significantly reduced.

2. What Impact Does Implicit Conversion Have on Performance?

In normal operations and maintenance scenarios, for a SQL query to execute efficiently, it will try to hit an index. The index column is stored in order according to the column type at the underlying level. When the query value and the column type are inconsistent, the database needs to traverse the index data and convert it for comparison, which prevents the index from being hit.

Next, we will combine several common scenarios to let you actually experience what implicit conversion is and how to deal with it.

3. Scenarios

The following scenarios will use MySQL version 8.0.37 for examples.

3.1 Data Preparation

First, use a stored procedure to generate 1 million test data rows.

-- Create test data table
DROP TABLE IF EXISTS employees;

CREATE TABLE employees (
  emp_no varchar(25) NOT NULL,
  name varchar(25) NOT NULL,
  salary int(10) NOT NULL,
  role varchar(25) NOT NULL,
  INDEX idx_emp_no (emp_no),
  INDEX idx_name (name),
  INDEX idx_salary (salary)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

-- Stored procedure to insert 1 million test data
DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 0;
  DECLARE random_role VARCHAR(25);
  WHILE i < 1000000 DO
    SET random_role = IF(RAND() < 0.5, 'Software Engineer', 'DevOps Engineer');

    INSERT INTO employees (emp_no, name, salary, role) VALUES (
      CONCAT('', i),
      CONCAT('User_', LEFT(UUID(), 8)),  -- Generate random name
      FLOOR(RAND() * 8000),              -- Generate random salary (0-8000)
      random_role
    );
    SET i = i + 1;
    IF i % 1000 = 0 THEN                  -- Commit every 1000 rows
      COMMIT;
    END IF;
  END WHILE;
END$$
DELIMITER ;

call insert_test_data();

3.2 Implicit Conversion Caused by Inconsistent Data Types Between Constants and Fields

A classic scenario is when the data type of the constant does not match the field type, causing implicit conversion. This is also the most common situation we encounter.

Here is an example:

SQL Query:

select * from employees where emp_no = 50;

EXPLAIN Result:

mysql> explain select * from employees where emp_no = 50;
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | ALL  | idx_emp_no    | NULL | NULL    | NULL | 996117 |    10.00 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+
1 row in set, 3 warnings (0.01 sec)

The Type part of the execution plan shows ALL, indicating a full table scan, and the index idx_emp_no is not used. This usually happens when the data type passed does not match the actual field type.

If we make a slight adjustment, the SQL query becomes:

select * from employees where emp_no = '50';

EXPLAIN Result:

mysql> explain select * from employees where emp_no = '50';
+----+-------------+-----------+------------+------+---------------+------------+---------+-------+------+----------+-------+
| id | select_type | table     | partitions | type | possible_keys | key        | key_len | ref   | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | employees | NULL       | ref  | idx_emp_no    | idx_emp_no | 102     | const |    1 |   100.00 | NULL  |
+----+-------------+-----------+------------+------+---------------+------------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

When the input data matches the varchar type of the field, the Type part of the execution plan shows ref, indicating that the SQL query can use the index properly.

From the example above, developers should be extra cautious when writing code like the following:

cursor.execute("SELECT * FROM employees WHERE emp_no = %s", (emp_no,));

cursor.execute("SELECT * FROM employees WHERE emp_no = '%s'", (emp_no,));

The presence or absence of quotes around %s can have a significant impact on performance.

3.3 Implicit Conversion Caused by Inconsistent Data Types in Join Fields

In practical scenarios, joining tables is also very common. When the data types of the join fields are inconsistent, implicit conversion can occur.

Let's prepare two more tables:

CREATE TABLE orders_1 (
  emp_no int(12) NOT NULL, -- Data type inconsistent with emp_no in employees
  order_date datetime DEFAULT NULL,
  INDEX idx_emp_no (emp_no)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

insert into orders_1 values (12, NOW());

CREATE TABLE orders_2 (
  emp_no varchar(25) NOT NULL, -- Data type inconsistent with emp_no in employees
  order_date datetime DEFAULT NULL,
  INDEX idx_emp_no (emp_no)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

insert into orders_2 values (12, NOW());

Let's test the situation where the join fields have inconsistent data types.

SQL Query:

select * from employees e, orders_1 o where e.emp_no = o.emp_no;

EXPLAIN Result:

mysql> explain select * from employees e, orders_1 o where e.emp_no = o.emp_no;
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra                                      |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+--------------------------------------------+
|  1 | SIMPLE      | o     | NULL       | ALL  | idx_emp_no    | NULL | NULL    | NULL |      1 |   100.00 | NULL                                       |
|  1 | SIMPLE      | e     | NULL       | ALL  | idx_emp_no    | NULL | NULL    | NULL | 996117 |    10.00 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+--------------------------------------------+
2 rows in set, 3 warnings (0.01 sec)

For the employees table, the database uses a full table scan, which performs poorly as the data volume increases.

What if the join fields have consistent data types? Let's see the EXPLAIN result:

SQL Query:

select * from employees e, orders_2 o where e.emp_no = o.emp_no;

EXPLAIN Result:

mysql> explain select * from employees e, orders_2 o where e.emp_no = o.emp_no;
+----+-------------+-------+------------+------+---------------+------------+---------+-------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key        | key_len | ref         | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------------+---------+-------------+------+----------+-------+
|  1 | SIMPLE      | o     | NULL       | ALL  | idx_emp_no    | NULL       | NULL    | NULL        |    1 |   100.00 | NULL  |
|  1 | SIMPLE      | e     | NULL       | ref  | idx_emp_no    | idx_emp_no | 102     | ai.o.emp_no |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+------------+---------+-------------+------+----------+-------+
2 rows in set, 1 warning (0.00 sec)

For the employees table, the database uses an index scan, which is more efficient than a full table scan.

3.4 Implicit Conversion Caused by Inconsistent Character Collation Rules

The story doesn't end here. Sometimes, even if the join fields have consistent data types, the index still can't be used. What could be the reason?

Let's prepare another table:

CREATE TABLE orders_3 (
  emp_no varchar(25) NOT NULL,
  order_date datetime DEFAULT NULL,
  INDEX idx_emp_no (emp_no)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

insert into orders_3 values (12, NOW());

The emp_no field in orders_3 has the same data type as in employees, but the collation of the character set is inconsistent. The collation is used to compare two strings, and when they are inconsistent, the database cannot compare them and resorts to a full table scan, which greatly reduces query performance.

SQL Query:

select * from employees e, orders_3 o where e.emp_no = o.emp_no;

EXPLAIN Result:

mysql> explain select * from employees e, orders_3 o where e.emp_no = o.emp_no;
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra                                          |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+------------------------------------------------+
|  1 | SIMPLE      | o     | NULL       | ALL  | idx_emp_no    | NULL | NULL    | NULL |      1 |   100.00 | NULL                                           |
|  1 | SIMPLE      | e     | NULL       | ALL  | idx_emp_no    | NULL | NULL    | NULL | 996117 |    10.00 | Range checked for each record (index map: 0x1) |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+------------------------------------------------+
2 rows in set, 2 warnings (0.01 sec)

The examples above are relatively simple and can be identified with the naked eye. However, in development and operations, once the SQL becomes complex, the manual tuning cost will skyrocket.

4. SQLFLASH Demo

Let's see how SQLFLASH can help us identify SQL performance optimization points and transform them.

Example SQL Query:

SELECT s.student_name, s.major, s.gender, s.date_of_birth
FROM students s
WHERE (
   SELECT 
       COUNT(*)
   FROM 
       enrollments e
   WHERE 
       e.student_id = s.student_id
       AND e.namespace = 111
) > 0
AND  s.major = 'Mathematics'
AND s.date_of_birth <= '2000-01-01'
AND s.student_name != CONCAT('Kevin',' ','Jenkins')
limit 1;

There is an implicit conversion in line 10 of the above example. Let's see how SQLFLASH performs:

SQLFLASH clearly informed me that it helped optimize an implicit conversion. Let's see if the modification is correct:

SQLFLASH has correctly modified it. For other optimization rules, interested friends can experiment on their own.

5. Summary

In this article, we have mainly introduced various scenarios of implicit conversions, such as inconsistent field types, inconsistent join field types, inconsistent character set types, or inconsistent collation rules. When SQL performance issues arise due to implicit conversions, analyzing the corresponding scenarios and taking appropriate measures can resolve the problem.

In addition, in our daily development and operations, we can also use tools to help us improve efficiency. SQLFLASH does a great job in this regard.

DEV Community

How Implicit Conversions Can Kill Database Performance

1. What Are Implicit Conversions?

2. What Impact Does Implicit Conversion Have on Performance?

3. Scenarios

3.1 Data Preparation

3.2 Implicit Conversion Caused by Inconsistent Data Types Between Constants and Fields

3.3 Implicit Conversion Caused by Inconsistent Data Types in Join Fields

3.4 Implicit Conversion Caused by Inconsistent Character Collation Rules

4. SQLFLASH Demo

5. Summary

Top comments (0)

Read next

How I Built a Currency Calculator API: A Step-by-Step Guide

Ever wondered how simple it is to build a chat app? 🤔 Set up a simple backend API for your chat application and see for yourself! Dive into this article to discover how SignalR makes real-time communication in .NET a breeze. 🚀💬🔥

Azul Intelligence Cloud Wins InfoWorld's 2024 DevOps Analytics Technology of the Year Award

Deno vs Bun vs Node.js: Performance & Benchmarks