GBase 8c is a multi-model database representing the third generation of intelligent distributed database products from GBASE. It supports various storage modes and deployment forms, offering high performance, high availability, high intelligence, high security, and strong compatibility. This article primarily describes failure cases caused by operating system versions, environmental issues, etc., that lead to installation failures. It aims to help users quickly locate problems, analyze causes, and provide solutions or ideas to proficiently use the GBase 8c database.
0. Fault Classification
- Basic environment not ready
- System version/environment issues
1. Basic Environment Not Ready
Preparation before installation mainly involves checking operating system information, whether dependency packages are installed, whether the hostname is effective, whether the firewall is closed, whether the root user passwords are consistent across multiple machines, whether the time is synchronized, etc. Specific steps can be referred to as follows:
1.1 Check Operating System Information
1) Whether the ID information is in the json file
# cat /etc/os-release # General command for Linux operating systems
NAME="Kylin Linux Advanced Server"
VERSION="V10 (Tercel)"
ID="kylin"
VERSION_ID="V10"
PRETTY_NAME="Kylin Linux Advanced Server V10 (Tercel)"
ANSI_COLOR="0;31"
# cat support_system_info.json
{
"SUSE": "suse",
"REDHAT": "redhat",
"CENTOS": "centos",
"EULEROS": "euleros",
"KYLIN": "kylin",
"KYLINSEC": "kylinsecos",
"OPENEULER": "openeuler",
"FUSIONOS": "fusionos",
"ASIANUX": "asianux",
"DEBIAN": "debian",
"UBUNTU": "ubuntu",
"NFS": "nfs",
"UOS": "uos",
"BCLINUX": "bclinux",
"NEOKYLIN": "neokylin",
"OPENKYLIN": "openkylin"
}
2) If it is a Kylin operating system, you also need to check whether the current version is based on Ubuntu, using the following command (either one will do):
# cat /etc/.kyinfo
# cat /etc/os-release
In the returned version information, if the dist_id parameter value starts with "Kylin-Desktop", it is an Ubuntu-based operating system.
You need to replace bash. Use the following command:
# sudo dpkg-reconfigure dash
During the process, select No and press Enter. After exiting, it will automatically switch to bash.
3) For Kylin operating systems, you need to disable security authorization authentication
# setstatus softmode -p
4) CentOS/RHEL 7.2+ environment
The systemd-logind service introduces a new feature—RemoveIPC, which manifests as: files created after a user logs in will be automatically deleted after logout. In the default case (i.e., RemoveIPC=yes), when a user logs out, the operating system will crash applications that use Shared Memory Segment (SHM) or Semaphores (SEM), causing the GBase database process to be interrupted. To avoid such problems, follow these steps:
(1) Check if the RemoveIPC parameter value is yes
# loginctl show-session | grep RemoveIPC
# systemctl show systemd-logind | grep RemoveIPC
If it is yes, it needs to be modified; if it is no, there is no need to continue with the following steps.
(2) Modify the /etc/systemd/logind.conf configuration file
# vim /etc/systemd/logind.conf
Set the RemoveIPC parameter value to no, type ":wq" to save and exit.
Modify the /usr/lib/systemd/system/systemd-logind.service configuration file, set the RemoveIPC parameter value to no.
# vim /usr/lib/systemd/system/systemd-logind.service
Set the RemoveIPC parameter value to no, type ":wq" to save and exit.
(3) Reload the configuration file, execute the following commands:
# systemctl daemon-reload
# systemctl restart systemd-logind
(4) Check again to see if it takes effect.
# loginctl show-session | grep RemoveIPC
# systemctl show systemd-logind | grep RemoveIPC
1.2 Check CPU Architecture
Confirm whether it is x86 or aarch64
# lscpu |grep 'Architecture'
Architecture: x86_64
It needs to be consistent with the information in the downloaded package name, for example:
GBase8cV6_SXXXXX_centos7.8_x86_64.tar.gz # x86 package
GBase8cV6_SXXXXX_centos7.8_aarch64.tar.gz # aarch64 package
1.3 Pre-installation Dependency Check
# rpm -qa|egrep "libaio-devel|flex|bison|ncurses-devel|glibc-devel|patch|redhat-lsb-core|readline-devel|libnsl|expect|patchelf|bzip2"
Installation command:
# yum install -y libaio-devel flex bison ncurses-devel glibc-devel patch redhat-lsb-core readline-devel libnsl expect patchelf bzip2
1.4 Confirm Hostname Consistency
Modify the hostname. Even for single-machine deployment, it is not recommended to use the default localhost.
Execute the command to view the hostname:
# cat /etc/hostname
# hostname
Modification method:
# hostnamectl set-hostname gbasedb01
1.5 Create Database OS Management User
Take gbase as an example, it is not recommended to use the graphical interface to create
# groupadd gbase
# useradd -m -d /home/gbase gbase -g gbase
# passwd gbase
1.6 Close the Firewall
# systemctl stop firewalld.service
# systemctl disable firewalld.service
# sed -i.bak '/^SELINUX=/s#SELINUX=.*#SELINUX=disabled#' /etc/selinux/config
# setenforce 0
If the environment requires it to be open, you can refer to the following configuration:
# systemctl start firewalld
# firewall-cmd --zone=public --add-port=15400/tcp --permanent
# firewall-cmd --zone=public --add-port=15300/tcp --permanent
# firewall-cmd --zone=public --add-port=15301/tcp --permanent
# firewall-cmd --zone=public --add-port=15302/tcp --permanent
# firewall-cmd --zone=public --add-port=15405/tcp --permanent
# firewall-cmd --zone=public --add-port=15401/tcp --permanent
# firewall-cmd --zone=public --add-port=5000/tcp --permanent
# firewall-cmd --zone=public --add-port=5001/tcp --permanent
# firewall-cmd --zone=public --add-port=5002/tcp --permanent
# firewall-cmd --zone=public --query-port=15400/tcp
systemctl restart firewalld
1.7 Multi-node Environment Installation Requires Checking Time Consistency
# date
Modification method:
# hwclock --show
# hwclock --systohc
Note: During the database startup process, time inconsistency will cause the startup to hang. When manually modifying the time, you need to modify the current time to a future time.
1.8 Check Maximum Open Files
The requirement is not less than 640000
# ulimit -n
Modification method:
# echo 'ulimit -n 1000000' >> ~/.bashrc
2. System Version/Environment Issues
2.1 Multi-node Installation, Root Password Inconsistency
Solution:
Temporarily adjust to be consistent
If it cannot be changed, follow the steps below to install
① Execute gs_preinstall under root on each machine, place the installation package and xml file in the same directory, and add the -L command to gs_preinstall to only perform local installation
./gs_preinstall -U gbase -G gbase -X /opt/cluster.xml -L
② Switch to the gbase user, establish mutual trust for the gbase user, after establishment, execute "ssh hostname", "ssh ip" to log in to the local machine and other machines, refresh the cache information (refer to the ssh-free configuration issue for details)
③ Execute the installation on any server
gs_install -X /opt/cluster.xml
2.2 SSH-free Configuration Issue
Complete password-free configuration reference is as follows (can avoid ssh permission issues)
# su - gbase
$ rm -rf /home/gbase/.ssh
$ mkdir ~/.ssh
$ chmod 700 ~/.ssh
$ ssh-keygen -t rsa
$ ssh-copy-id gbase@172.16.xx.xxx
$ ssh-copy-id gbase@172.16.xx.xxx
$ echo 'StrictHostKeyChecking no' >> ~/.ssh/config
$ echo 'UserKnownHostsFile ~/.ssh/known_hosts' >> ~/.ssh/config
$ chmod 644 ~/.ssh/config
Verification test (both IP and hostname need to be tested)
$ ssh 172.16.xx.xxx date
$ ssh 172.16.xx.xxx date
$ ssh gbasedb01 date
$ ssh gbasedb01 date
If during verification, it is found that ssh ip works but ssh hostname does not, ping the hostname to confirm if it is ipv6. If it is ipv6, perform the following operations:
echo 'precedence ::ffff:0:0/96 100' | sudo tee -a /etc/gai.conf
2.3 Environment Issues
(1) Error xxxx list index out of range during installation
Check the xml file during installation, focusing on whether the hostname and IP address are configured correctly.
(2) Error "importerror:libffi.so.6:cannot open shared object file: no such file or directory" during installation.
This may be due to the server's CPU architecture being ARM version, and the residual files from previously installed X86 version packages. In this case, according to the error message, delete all corresponding residues.
ll /lib64/libffi.so.6*
sudo rm -rf /lib64/libffi.so.6*
(3) Error "cannot execute binary file" during installation.
This is due to the GBase 8c installation package not matching the CPU architecture. According to the deployment environment, replace the corresponding version of the installation package. For details, see 1.2 Check CPU Architecture.
2.4 VIP Configuration Issue After Installation
1) VIP address does not float
Check if sudo permissions are configured on all machines
$ su - gbase
$ sudo -l
Check if VIP is configured on the standby machine
$ sudo find /* -name 'cm_resource.json'
2) The entire cluster is unavailable when the primary node is down
Check if the VIP information is saved in the cm_server configuration
cm_ctl list --param --server| grep "third_party_gateway_ip"
If the above query result is empty, configure as follows:
cm_ctl set --param --server -k "third_party_gateway_ip= gateway address"
cm_ctl set --param --server -k "cms_enable_db_crash_recovery=1"
cm_ctl set --param --server -k "cms_network_isolation_timeout=10"
cm_ctl set --param --server -k "cms_enable_failover_on2nodes=1"
This article aims to provide a comprehensive guide to troubleshooting common issues encountered during the installation and configuration of GBase 8c primary-standby clusters. By following the detailed steps and solutions provided, users can ensure a smooth and successful deployment of the GBase 8c database. For further assistance, refer to the official documentation or reach out to the support team.
Top comments (0)