Index Optimization Strategies in GBase 8s: Exploring Performance of Multi-Field Filters in Single-Table Queries (WHERE Clause)

In database queries, filtering a single table using multiple fields is a common operation. However, without appropriate index support, these queries can become slow. This article explores the impact of indexes on query performance with different combinations of filtering fields by creating a test environment, simulating data generation, and executing a series of test SQL statements.

1. Create Test Table

create table t_user(
c_id serial primary key,  --ID
c_name  varchar(50),  --NAME
c_age int,c_sex char(6),  --GENDER
c_cardno char(20),  --ID NUMBER
c_birthday char(10),  --BIRTH DATE
c_phone char(11),  --PHONE NUMBER
c_address1 varchar(255),  --ADDRESS 1
c_address2 lvarchar(255)  --ADDRESS 2
);

2. Generate Test Data

sh gendata.sh 100000>t_user.unl

#!/bin/sh
#gendata.sh
for i in `seq $1`
do
surnames=("Smith" "Johnson" "Williams" "Brown" "Jones" "Garcia" "Miller" "Davis" "Rodriguez" "Martinez" "Hernandez" "Lopez" "Gonzalez" "Wilson" "Anderson" "Thomas" "Taylor" "Moore" "Jackson" "Martin" "Lee" "Perez" "Thompson" "White" "Harris" "Sanchez" "Clark" "Ramirez" "Lewis" "Robinson" "Walker" "Young" "Allen" "King" "Wright" "Scott" "Torres" "Nguyen" "Hill" "Flores" "Green" "Adams" "Nelson" "Baker" "Hall" "Rivera" "Campbell" "Mitchell" "Carter" "Roberts" "Gomez" "Phillips" "Evans" "Turner" "Diaz" "Parker" "Cruz" "Edwards" "Collins" "Reyes" "Stewart" "Morris" "Morales" "Peterson")
num=$((RANDOM % ${#surnames}))
given_names=("James" "Mary" "John" "Patricia" "Robert" "Jennifer" "Michael" "Linda" "William" "Elizabeth" "David" "Barbara" "Richard" "Susan" "Joseph" "Jessica" "Thomas" "Sarah" "Charles" "Karen" "Christopher" "Nancy" "Daniel" "Margaret" "Matthew" "Lisa" "Anthony" "Betty" "Donald" "Dorothy" "Mark" "Sandra" "Paul" "Ashley" "Steven" "Kimberly" "Andrew" "Donna" "Kenneth" "Emily" "George" "Michelle" "Joshua" "Carol" "Kevin" "Amanda" "Brian" "Melissa" "Edward" "Deborah" "Ronald" "Stephanie" "Timothy" "Rebecca" "Jason" "Sharon" "Jeffrey" "Laura" "Ryan" "Cynthia" "Jacob" "Kathleen" "Gary" "Amy")
num1=$((RANDOM % ${#given_names}))
num2=$((RANDOM % ${#given_names}))
age=$(( $RANDOM % (99)))
genders=("M" "F")
gender=$((RANDOM % ${#genders}))
random_day=$((RANDOM % (36500)))
target_timestamp=$((random_day * 86400))
random_date=$(date -d @$target_timestamp "+%Y-%m-%d")
addr=`openssl rand -base64 100`
echo "0|${surname} ${given_name1} ${given_name2}|${age}|${gender}|${id}|${random_date}|${phone}|${addr}|${addr}|"
done

Or use Python and the Faker library to generate test data:

#!/usr/bin/env python3
import sys
import datetime
from faker import Faker

# Get parameters
num = 0
if len(sys.argv) == 2:
    num = str(sys.argv[1])

# Get the current year
curyear = datetime.datetime.now().year

# Initialize Faker for English data
fdata = Faker("en_US")

# Print random data 'num' times
for i in range(int(num)):
    # Generate a random SSN, get birth date and gender
    ssn = fdata.ssn()
    year = ssn[0:3]
    month = ssn[4:6]
    day = ssn[7:9]
    sex = int(ssn[-1]) % 2
    birth = f'{year}-{month}-{day}'
    print("%d|%s|%d|%s|%s|%s|%s|%s|%s|" % (
         i + 1,
         fdata.name_male() if sex == 1 else fdata.name_female(),
         curyear - int(year),
         "Male" if sex == 1 else "Female",
         ssn,
         birth,
         fdata.phone_number(),
         fdata.address(),
         fdata.address()))

3. Import Data

echo "load from t_user.unl insert into t_user;" |dbaccess testdb

4. Test SQL and Results

No.	SQL	Number of Rows	Filter Fields	Index Fields	Execution Time
1	select * from t_user where c_name='Tianjin' and c_sex='Male' and c_cardno='430524199008129900';	100000	c_name and c_sex and c_cardno	None	0.022 sec
2	select * from t_user where c_name='Tianjin' and c_sex='Male' and c_cardno='430524199008129900';	100000	c_name and c_sex and c_cardno	c_name	0.003 sec
3	select * from t_user where c_name='Tianjin' and c_sex='Male' and c_cardno='430524199008129900';	100000	c_name and c_sex and c_cardno	c_sex	0.041 sec
4	select * from t_user where c_name='Tianjin' and c_sex='Male' and c_cardno='430524199008129900';	100000	c_name and c_sex and c_cardno	c_cardno	0.002 sec
5	select count(*) from t_user where c_name='Tianjin' or c_sex='Male' or c_cardno='430524199008129900';	100000	c_name or c_sex or c_cardno	None	0.027 sec
6	select count(*) from t_user where c_name='Tianjin' or c_sex='Male' or c_cardno='430524199008129900';	100000	c_name or c_sex or c_cardno	c_name	0.028 sec
7	select count(*) from t_user where c_name='Tianjin' or c_sex='Male' or c_cardno='430524199008129900';	100000	c_name or c_sex or c_cardno	c_sex	0.027 sec
8	select count(*) from t_user where c_name='Tianjin' or c_sex='Male' or c_cardno='430524199008129900';	100000	c_name or c_sex or c_cardno	c_cardno	0.028 sec
9	select count(*) from t_user where c_name='Tianjin' or c_sex='Male' or c_cardno='430524199008129900';	100000	c_name or c_sex or c_cardno	c_cardno, c_name, c_sex	0.028 sec
10	select count(*) from t_user where c_name='Tianjin' or c_cardno='430524199008129900';	100000	c_name or c_cardno	None	0.027 sec
11	select count(*) from t_user where c_name='Tianjin' or c_cardno='430524199008129900';	100000	c_name or c_cardno	c_name	0.029 sec
12	select count(*) from t_user where c_name='Tianjin' or c_cardno='430524199008129900';	100000	c_name or c_cardno	c_cardno	0.027 sec
13	select count(*) from t_user where c_name='Tianjin' or c_cardno='430524199008129900';	100000	c_name or c_cardno	idx1(c_cardno, c_name)	0.042 sec
14	select count(*) from t_user where c_name='Tianjin' or c_cardno='430524199008129900';	100000	c_name or c_cardno	idx1(c_cardno), idx2(c_name)	0.005 sec

When multiple filtering fields are combined with AND, create an index on the field with high selectivity.
When multiple filtering fields are combined with OR and none of the fields have low selectivity, create separate indexes for each field.

Through the testing and analysis of single-table multi-field filtering queries, we gain a deeper understanding of the role of indexes in optimizing database queries. Proper index design can significantly improve query performance and reduce system resource consumption. We hope that the test results and optimization strategies discussed in this article provide valuable insights for database administrators and developers in index design. Thank you for reading, and we hope this article offers a practical perspective on optimizing database query performance.