DEV Community

Cong Li
Cong Li

Posted on

Index Optimization Strategies in GBase 8s: Exploring Performance of Multi-Field Filters in Single-Table Queries (WHERE Clause)

In database queries, filtering a single table using multiple fields is a common operation. However, without appropriate index support, these queries can become slow. This article explores the impact of indexes on query performance with different combinations of filtering fields by creating a test environment, simulating data generation, and executing a series of test SQL statements.

1. Create Test Table

create table t_user(
c_id serial primary key,  --ID
c_name  varchar(50),  --NAME
c_age int,c_sex char(6),  --GENDER
c_cardno char(20),  --ID NUMBER
c_birthday char(10),  --BIRTH DATE
c_phone char(11),  --PHONE NUMBER
c_address1 varchar(255),  --ADDRESS 1
c_address2 lvarchar(255)  --ADDRESS 2
);
Enter fullscreen mode Exit fullscreen mode

2. Generate Test Data

sh gendata.sh 100000>t_user.unl
Enter fullscreen mode Exit fullscreen mode
#!/bin/sh
#gendata.sh
for i in `seq $1`
do
surnames=("Smith" "Johnson" "Williams" "Brown" "Jones" "Garcia" "Miller" "Davis" "Rodriguez" "Martinez" "Hernandez" "Lopez" "Gonzalez" "Wilson" "Anderson" "Thomas" "Taylor" "Moore" "Jackson" "Martin" "Lee" "Perez" "Thompson" "White" "Harris" "Sanchez" "Clark" "Ramirez" "Lewis" "Robinson" "Walker" "Young" "Allen" "King" "Wright" "Scott" "Torres" "Nguyen" "Hill" "Flores" "Green" "Adams" "Nelson" "Baker" "Hall" "Rivera" "Campbell" "Mitchell" "Carter" "Roberts" "Gomez" "Phillips" "Evans" "Turner" "Diaz" "Parker" "Cruz" "Edwards" "Collins" "Reyes" "Stewart" "Morris" "Morales" "Peterson")
num=$((RANDOM % ${#surnames}))
given_names=("James" "Mary" "John" "Patricia" "Robert" "Jennifer" "Michael" "Linda" "William" "Elizabeth" "David" "Barbara" "Richard" "Susan" "Joseph" "Jessica" "Thomas" "Sarah" "Charles" "Karen" "Christopher" "Nancy" "Daniel" "Margaret" "Matthew" "Lisa" "Anthony" "Betty" "Donald" "Dorothy" "Mark" "Sandra" "Paul" "Ashley" "Steven" "Kimberly" "Andrew" "Donna" "Kenneth" "Emily" "George" "Michelle" "Joshua" "Carol" "Kevin" "Amanda" "Brian" "Melissa" "Edward" "Deborah" "Ronald" "Stephanie" "Timothy" "Rebecca" "Jason" "Sharon" "Jeffrey" "Laura" "Ryan" "Cynthia" "Jacob" "Kathleen" "Gary" "Amy")
num1=$((RANDOM % ${#given_names}))
num2=$((RANDOM % ${#given_names}))
age=$(( $RANDOM % (99)))
genders=("M" "F")
gender=$((RANDOM % ${#genders}))
random_day=$((RANDOM % (36500)))
target_timestamp=$((random_day * 86400))
random_date=$(date -d @$target_timestamp "+%Y-%m-%d")
addr=`openssl rand -base64 100`
echo "0|${surname} ${given_name1} ${given_name2}|${age}|${gender}|${id}|${random_date}|${phone}|${addr}|${addr}|"
done
Enter fullscreen mode Exit fullscreen mode

Or use Python and the Faker library to generate test data:

#!/usr/bin/env python3
import sys
import datetime
from faker import Faker

# Get parameters
num = 0
if len(sys.argv) == 2:
    num = str(sys.argv[1])

# Get the current year
curyear = datetime.datetime.now().year

# Initialize Faker for English data
fdata = Faker("en_US")

# Print random data 'num' times
for i in range(int(num)):
    # Generate a random SSN, get birth date and gender
    ssn = fdata.ssn()
    year = ssn[0:3]
    month = ssn[4:6]
    day = ssn[7:9]
    sex = int(ssn[-1]) % 2
    birth = f'{year}-{month}-{day}'
    print("%d|%s|%d|%s|%s|%s|%s|%s|%s|" % (
         i + 1,
         fdata.name_male() if sex == 1 else fdata.name_female(),
         curyear - int(year),
         "Male" if sex == 1 else "Female",
         ssn,
         birth,
         fdata.phone_number(),
         fdata.address(),
         fdata.address()))
Enter fullscreen mode Exit fullscreen mode

3. Import Data

echo "load from t_user.unl insert into t_user;" |dbaccess testdb
Enter fullscreen mode Exit fullscreen mode

4. Test SQL and Results

No. SQL Number of Rows Filter Fields Index Fields Execution Time
1 select * from t_user where c_name='Tianjin' and c_sex='Male' and c_cardno='430524199008129900'; 100000 c_name and c_sex and c_cardno None 0.022 sec
2 select * from t_user where c_name='Tianjin' and c_sex='Male' and c_cardno='430524199008129900'; 100000 c_name and c_sex and c_cardno c_name 0.003 sec
3 select * from t_user where c_name='Tianjin' and c_sex='Male' and c_cardno='430524199008129900'; 100000 c_name and c_sex and c_cardno c_sex 0.041 sec
4 select * from t_user where c_name='Tianjin' and c_sex='Male' and c_cardno='430524199008129900'; 100000 c_name and c_sex and c_cardno c_cardno 0.002 sec
5 select count(*) from t_user where c_name='Tianjin' or c_sex='Male' or c_cardno='430524199008129900'; 100000 c_name or c_sex or c_cardno None 0.027 sec
6 select count(*) from t_user where c_name='Tianjin' or c_sex='Male' or c_cardno='430524199008129900'; 100000 c_name or c_sex or c_cardno c_name 0.028 sec
7 select count(*) from t_user where c_name='Tianjin' or c_sex='Male' or c_cardno='430524199008129900'; 100000 c_name or c_sex or c_cardno c_sex 0.027 sec
8 select count(*) from t_user where c_name='Tianjin' or c_sex='Male' or c_cardno='430524199008129900'; 100000 c_name or c_sex or c_cardno c_cardno 0.028 sec
9 select count(*) from t_user where c_name='Tianjin' or c_sex='Male' or c_cardno='430524199008129900'; 100000 c_name or c_sex or c_cardno c_cardno, c_name, c_sex 0.028 sec
10 select count(*) from t_user where c_name='Tianjin' or c_cardno='430524199008129900'; 100000 c_name or c_cardno None 0.027 sec
11 select count(*) from t_user where c_name='Tianjin' or c_cardno='430524199008129900'; 100000 c_name or c_cardno c_name 0.029 sec
12 select count(*) from t_user where c_name='Tianjin' or c_cardno='430524199008129900'; 100000 c_name or c_cardno c_cardno 0.027 sec
13 select count(*) from t_user where c_name='Tianjin' or c_cardno='430524199008129900'; 100000 c_name or c_cardno idx1(c_cardno, c_name) 0.042 sec
14 select count(*) from t_user where c_name='Tianjin' or c_cardno='430524199008129900'; 100000 c_name or c_cardno idx1(c_cardno), idx2(c_name) 0.005 sec
  • When multiple filtering fields are combined with AND, create an index on the field with high selectivity.
  • When multiple filtering fields are combined with OR and none of the fields have low selectivity, create separate indexes for each field.

Through the testing and analysis of single-table multi-field filtering queries, we gain a deeper understanding of the role of indexes in optimizing database queries. Proper index design can significantly improve query performance and reduce system resource consumption. We hope that the test results and optimization strategies discussed in this article provide valuable insights for database administrators and developers in index design. Thank you for reading, and we hope this article offers a practical perspective on optimizing database query performance.

Top comments (0)