DEV Community

Cover image for Configuring Your Home Cluster Network: A Complete Guide
Cheedge Lee
Cheedge Lee

Posted on • Originally published at notes-renovation.hashnode.dev

Configuring Your Home Cluster Network: A Complete Guide

It’s a renovated note.

Nowadays many of us will enjoy the cloud cluster rather than build a self managed cluster, as it’s less management, high availability, more secure, pay-as-you-go, and all the advantages you can think of the cloud computing. However, if you accidentally own several old computers, and don’t want to sell/transfer them and don’t know how to deal with them, a home-managed cluster will be a good choice. there is a lot of fun of a self managed cluster.

Architecture

Let’s define our architecture here: 4 nodes: 1 master/worker, 3 worker

Image description

1. master node

  • eth0: connect with institute cable

    • public interface, using DHCP
    • internet download, update
    • users can access it: ssh/scp
  • eth1: connect with worker ndoes

    • private interface, using static IP
    • communicate other code:

      • ssh/scp
      • data transfer
      • parallel communicate

config network interface eth0 and eth1

# Configure public interface (assumes DHCP from institute network)
cat > /etc/sysconfig/network-scripts/ifcfg-eth0 << EOF
TYPE=Ethernet
DEVICE=eth0
BOOTPROTO=dhcp
ONBOOT=yes
# Request static IP from our institute's DHCP server if possible
# This makes routing more reliable
DHCP_CLIENT_ID=cluster-master
EOF

# Configure private cluster network
cat > /etc/sysconfig/network-scripts/ifcfg-eth1 << EOF
TYPE=Ethernet
DEVICE=eth1
BOOTPROTO=static
IPADDR=192.168.10.1
NETMASK=255.255.255.0
ONBOOT=yes
EOF

# Apply new network configuration
systemctl restart network
Enter fullscreen mode Exit fullscreen mode

NAT

# Enable IP forwarding persistently
echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf

# Enable connection tracking timeout optimization for HPC workloads
echo "net.netfilter.nf_conntrack_tcp_timeout_established = 86400" >> /etc/sysctl.conf
echo "net.netfilter.nf_conntrack_max = 131072" >> /etc/sysctl.conf

# Apply sysctl changes
sysctl -p

# Set up NAT with higher connection limits
iptables -t nat -A POSTROUTING -o eth0 -s 192.168.10.0/24 -j MASQUERADE
Enter fullscreen mode Exit fullscreen mode

the last command is important, as it will contribute to the return traffic. We will explain later.

setup the firewall

# Clear existing rules
iptables -F
iptables -X
iptables -t nat -F
iptables -t nat -X
iptables -t mangle -F
iptables -t mangle -X

# Set default policies
iptables -P INPUT DROP
iptables -P FORWARD DROP  # We'll explain this choice
iptables -P OUTPUT ACCEPT

# Allow loopback
iptables -A INPUT -i lo -j ACCEPT

# Allow established and related connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT

# Allow SSH from institute network
iptables -A INPUT -i eth0 -p tcp --dport 22 -j ACCEPT

# Allow all traffic from the cluster's private network
iptables -A INPUT -i eth1 -s 192.168.10.0/24 -j ACCEPT

# Allow forwarding from cluster to internet
iptables -A FORWARD -i eth1 -s 192.168.10.0/24 -o eth0 -j ACCEPT

# Allow HTTP/HTTPS for package downloads
iptables -A INPUT -i eth0 -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -i eth0 -p tcp --dport 443 -j ACCEPT

# Set up local package caching repository later (optional)
iptables -A INPUT -i eth1 -p tcp --dport 80 -j ACCEPT

# Save iptables rules
iptables-save > /etc/sysconfig/iptables
Enter fullscreen mode Exit fullscreen mode

I set the default FORWARD policy to DROP for security reasons:

  • It prevents unauthorized traffic from traversing the master node

  • It creates a default-deny stance, where only explicitly allowed traffic passes

  • It prevents potential lateral movement if one node is compromised

2. worker nodes

config

# Worker node 1 (192.168.10.2)
cat > /etc/sysconfig/network-scripts/ifcfg-eth0 << EOF
TYPE=Ethernet
DEVICE=eth0
BOOTPROTO=static
IPADDR=192.168.10.2
NETMASK=255.255.255.0
GATEWAY=192.168.10.1
DNS1=8.8.8.8
ONBOOT=yes
EOF

# here GATEWAY will auto generate the route rule in table

# Worker node 2 (192.168.10.3)
# Change IPADDR=192.168.10.3 on the second worker node

# Worker node 3 (192.168.10.4)
# Change IPADDR=192.168.10.4 on the third worker node

# Restart network service on each worker node
systemctl restart network
Enter fullscreen mode Exit fullscreen mode

Routing Configuration for Worker Nodes

As we use the master node eth1 (192.168.10.1) as the gateway for work nodes (GATEWAY=192.168.10.1), above setting creates a default route on each worker node that sends all traffic not destined for the local network (192.168.10.0/24) to the master node (192.168.10.1).

$ route -n
# see the result: (send all traffic (0.0.0.0/0) to gateway 192.168.10.1)
0.0.0.0         192.168.10.1      0.0.0.0         UG    0      0        0 eth0
Enter fullscreen mode Exit fullscreen mode

Manual Command

Using manual command can also achieve the same result.

route add -net 0.0.0.0 gw 192.168.10.1
Enter fullscreen mode Exit fullscreen mode

This manually adds a default route to the current routing table. It has the same immediate effect as the configuration file setting, but it's temporary and will be lost after a reboot or network service restart.

The difference is primarily in persistence and when the configuration happens. Using the network configuration file is the standard way to set up permanent routes in CentOS/RHEL systems.

Fire wall

# Clear existing rules
iptables -F
iptables -X

# Set default policies
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT

# Allow loopback
iptables -A INPUT -i lo -j ACCEPT

# Allow established connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# Allow SSH from cluster nodes only
iptables -A INPUT -p tcp --dport 22 -s 192.168.10.0/24 -j ACCEPT

# HPC/MPI Communication - comprehensive approach
# Allow all TCP/UDP between cluster nodes for parallel computing
# this is optinal if you don't want do parallel computing
iptables -A INPUT -s 192.168.10.0/24 -p tcp -j ACCEPT
iptables -A INPUT -s 192.168.10.0/24 -p udp -j ACCEPT

# Save iptables rules
iptables-save > /etc/sysconfig/iptables
systemctl enable iptables
Enter fullscreen mode Exit fullscreen mode

3. Package management

3.1 Local yum Repo

  1. because the worker node will not have the access to internet, we keep them inside the private, therefore, for package installation, updating, we need to find a way to resolve these.

  2. we want to specify the packages in our yum repo

so here we build a local yum repo

# On master node
# Install required packages
yum install -y createrepo nginx

# Create repository directory
mkdir -p /var/www/html/centos-repo

# Configure Nginx
cat > /etc/nginx/conf.d/repo.conf << EOF
server {
    listen 80;
    server_name _;
    root /var/www/html;

    location / {
        autoindex on;
    }
}
EOF

# Start and enable Nginx
systemctl enable nginx
systemctl start nginx

# Download packages to repository
yum install -y yum-utils
repotrack -p /var/www/html/centos-repo <package-name> 
# Repeat for packages we need

# Create repository metadata
createrepo /var/www/html/centos-repo

# Configure worker nodes to use this repository
cat > /etc/yum.repos.d/cluster-local.repo << EOF
[cluster-local]
name=Cluster Local Repository
baseurl=http://192.168.10.1/centos-repo
enabled=1
gpgcheck=0
EOF
Enter fullscreen mode Exit fullscreen mode

3.2 Optional for package management

In order to realise

  1. worker node package update

  2. self-managed packages

we can also use scp to transfer package from master node.

# On master node, download and transfer RPM
yum install -y yum-utils
yumdownloader <package-name>
scp <package-name>.rpm 192.168.10.1:/tmp/

# On worker node
sudo rpm -ivh /tmp/<package-name>.rpm
Enter fullscreen mode Exit fullscreen mode

3.3 directly route worker node to internet

If we don’t need high security, we can also open the private cluster to public internet. Which will configure the router table and we don’t discuss here.

4. Other Installations

ssh configuration

# On master node, generate SSH key
ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa -N ""

# Copy the key to all nodes (including itself)
for i in {1..4}; do
  ssh-copy-id -i ~/.ssh/id_rsa.pub 10.10.10.$i
done

# Do the same on each worker node to allow any-to-any communication
# (Run similar commands on each worker node)
Enter fullscreen mode Exit fullscreen mode

Network Monitor & iptables Log

# Install tools
yum install -y tcpdump nmap iftop

# Set up automatic monitoring with fail2ban to prevent brute force attacks
yum install -y fail2ban
cat > /etc/fail2ban/jail.local << EOF
[sshd]
enabled = true
port = ssh
filter = sshd
logpath = /var/log/secure
maxretry = 5
bantime = 3600
EOF

# Start and enable fail2ban
systemctl enable fail2ban
systemctl start fail2ban

# Add logging rules before the final DROP rules
iptables -A INPUT -j LOG --log-prefix "IPTables-Input-Dropped: " --log-level 4
iptables -A FORWARD -j LOG --log-prefix "IPTables-Forward-Dropped: " --log-level 4

# Save iptables rules
iptables-save > /etc/sysconfig/iptables
Enter fullscreen mode Exit fullscreen mode

Optional Parallel Computing Configuration

# On all nodes (master and workers)
# Install OpenMPI
yum install -y openmpi openmpi-devel

# Configure environment in /etc/profile.d/
cat > /etc/profile.d/mpi.sh << EOF
export PATH=\$PATH:/usr/lib64/openmpi/bin
export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:/usr/lib64/openmpi/lib
EOF

# Source the new environment
source /etc/profile.d/mpi.sh

# Test MPI communication
# Create a hostfile
cat > /home/username/hostfile << EOF
192.168.10.1 slots=128
192.168.10.2 slots=128
192.168.10.3 slots=128
192.168.10.4 slots=128
EOF

# Run a simple MPI test
mpirun -np 4 --hostfile /home/username/hostfile hostname
Enter fullscreen mode Exit fullscreen mode

Torque/PBS

# Install Torque on master node
yum install -y torque-server torque-scheduler torque-client

# Configure server nodes file
cat > /var/torque/server_priv/nodes << EOF
192.168.10.1 np=128
192.168.10.2 np=128
192.168.10.3 np=128
192.168.10.4 np=128
EOF

# Start Torque server
systemctl enable pbs_server
systemctl start pbs_server

# Install Torque on worker nodes
for i in {2..4}; do
  ssh 192.168.10.$i "yum install -y torque-mom torque-client; systemctl enable pbs_mom; systemctl start pbs_mom"
done
Enter fullscreen mode Exit fullscreen mode

5. Traffic Flow:

Forward Chain Traffic Flow in Both Directions

When we create the forward chain, iptables -A FORWARD -i eth1 -s 192.168.10.0/24 -o eth0 -j ACCEPT, this command let the traffic come into eth1 can be forward to eth0, which means traffic from worker nodes to master nodes, and master nodes forward it to institute internet. Here comes the question, where is the backward flow?

iptables -A FORWARD -i eth1 -s 192.168.10.0/24 -o eth0 -j ACCEPT
Enter fullscreen mode Exit fullscreen mode

Let’s see how the traffic flows first.

Outbound Traffic (Worker → Internet)

The rule above allows packets to travel from the worker nodes (coming in on eth1) to be forwarded out to the institute network (through eth0). This handles the first half of any connection, which is the outbound request.

Return Traffic (Internet → Worker)

For the return traffic, typically we will think what we need as:

iptables -A FORWARD -i eth0 -d 192.168.10.0/24 -o eth1 -j ACCEPT
Enter fullscreen mode Exit fullscreen mode

However, if we look closely at the original configuration, here is the command:

iptables -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT
Enter fullscreen mode Exit fullscreen mode

This rule is handling the return traffic, because:

  1. When a worker node initiates a connection, the outbound packet creates an entry in the connection tracking table (create a ESTABLISHED connection)

  2. Any returning packets associated with that connection are marked as ESTABLISHED

  3. The rule above allows all ESTABLISHED connections through, regardless of interface

This is more secure than explicitly allowing all traffic from eth0 to eth1, because it only permits return traffic for connections that were initiated from inside our cluster.

If this state tracking rule wasn't present, we would absolutely need to the backward traffic rule. Without either approach, connections would work one-way only, which means worker nodes could send requests, but never receive response…

Connection Flow

Let's trace a web request from a worker node:

  1. Worker (192.168.10.2) tries to access google.com
  2. Packet travels: Worker → Master's eth1
  3. Master checks FORWARD chain, matches -i eth1 -s 192.168.10.0/24 -o eth0 rule
  4. Master performs NAT, changing source IP to its own public IP
  5. Packet leaves through eth0 to institute network
  6. Google responds to master's public IP
  7. Packet arrives at master's eth0
  8. Master checks connection tracking table, sees this is a response
  9. Packet is marked as ESTABLISHED
  10. Master checks FORWARD chain, matches the ESTABLISHED rule
  11. Master performs reverse NAT, changing destination to worker's IP
  12. Packet leaves through eth1 to worker
  13. Worker receives response

More details

more details can check the post before here.

Top comments (0)