A high availability (HA) cluster is a group of interconnected computers, servers or nodes, that work together to ensure that an application or service remains available even if one or more of the nodes in the cluster fails. The primary goal of an HA cluster is to provide continuous availability and minimize downtime by distributing the workload across multiple nodes.
In an HA cluster, each node is responsible for a specific task, and if one node fails, another node in the cluster takes over that task automatically.
This article assumes that you have setup two identical centos vms or physical machines, each running apache web servers that host the same website. In this article, node1 has ip address 192.168.1.2/24 and node2 has 192.168.1.3/24.
ON BOTH NODES
yum install ricci luci ccs cman modcluster cluster* -y
start the ricci service
service ricci start
set a password for ricci user on both nodes
passwd ricci
IN THE MASTER NODE
create the cluster
ccs -h 192.168.1.2 --createcluster mycluster
add nodes to the cluster
ccs -h 192.168.1.2 --addnode 192.168.1.2
ccs -h 192.168.1.2 --addnode 192.168.1.3
ccs -h 192.168.1.2 --lsnodes
lists all nodes added in the cluster
Add Fencing to Cluster
Fencing is the disconnection of a node from shared storage. Fencing cuts off I/O from shared storage, thus ensuring data integrity.
ccs -h 192.168.1.2 --setfencedaemon post_fail_delay=0
ccs -h 192.168.1.2 --setfencedaemon post_join_delay=0
Create a fence device
A fence device is a hardware device that can be used to cut a node off from shared storage.
A fence agent is a software program that connects to a fence device in order to ask the fence device to cut off access to a node’s shared storage (via powering off the node or removing access to the shared storage by other means).
There are different types of fencing devices available. If you are using virtual machine to build a cluster, use fence_virt device as shown below.
ccs -h 192.168.1.2 --addfencedev myfence agent=fence_virt
create fence method and add nodes to it
ccs -h 192.168.1.2 --addmethod mymethod 192.168.1.2
ccs -h 192.168.1.2 --addethod mymethod 192.168.1.3
add fence method to fence device
ccs -h 192.168.1.2 --addfenceinst myfence 192.168.1.2 mymethod
css -h 192.168.1.2 --addfenceinst myfence 192.168.1.3 mymethod
create a failover domain
A failover domain is an ordered subset of cluster members to which a resource group or service may be bound.
ccs -h 192.168.1.2 --addfailoverdomain mywebserverdomain ordered
add both nodes to the failover domain
ccs -h 192.168.1.2 --addfailoverdomainnode mywebserverdomain 192.168.1.2 1
ccs -h 192.168.1.2 --addfailoverdomainnode mywebserverdomain 192.168.1.3 2
ccs -h 192.168.1.2 --lsfailoverdomain
shows the nodes and their priorities
add apache webserver service to the cluster setup
ccs -h 192.168.1.2 --addservice apache domain=mywebserverdomain recovery=relocate autostart=1
sync cluster configurations across all cluster nodes
ccs -h 192.168.1.2 --sync --activate
check cluster configuration
ccs -h 192.168.1.2 --checkconf
OUTPUT: All nodes in sync
Disable NetworkManager on all nodes because it stops cman service from starting
service NetworkManager stop
chkconfig NetworkManager off
start the cman service on all nodes
service cman start
use clustat to check status of all cluster nodes
clustat
OUTPUT
Cluster Status for mycluster @ Fri Apr 21 14:00:22
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
192.168.1.2 1 Online, Local
192.168.1.3 2 Online
TESTING CLUSTER NODES
bring down a node
ccs -h 192.168.1.3 --stop
Do clustat
to check status of all nodes
OUTPUT
Cluster Status for mycluster @ Fri Apr 21 14:00:22
Member Status: Inquorate
Member Name ID Status
------ ---- ---- ------
192.168.1.2 1 Online, Local
192.168.1.3 2 Offline
On the node that is down, clustat give an error saying: "Could not connect to CMAN: No such file or directory"
start the offline node
ccs -h 192.168.1.3 --start
THE END
Top comments (0)