PostgreSQL replication is a powerful mechanism that ensures data consistency and availability across multiple databases. It enables data from one database (the primary) to be synchronized with one or more other databases (the replicas). This process is crucial for applications that require high availability, scalability, or distributed data sharing.
This guide explores the types of PostgreSQL replication, their benefits, and real-world use cases, followed by a step-by-step setup process for logical replication.
What is PostgreSQL Replication?
PostgreSQL replication is a process where data from one database (primary) is copied and kept in sync with one or more other databases (replicas).
There are two main types of replication in PostgreSQL:
- Physical Replication: Entire data is replicated at the block level. It’s commonly used for high availability (e.g., streaming replication).
- Logical Replication: Replicates changes at the logical level (row-based). It’s more flexible and allows selective replication of data, meaning you can choose specific tables or parts of a database to replicate.
Benefits of Logical Replication
- Selective Replication: Replicate only specific tables or rows, improving performance and storage.
- Cross-Version Replication: Allows replication between different PostgreSQL versions.
- Custom Transformations: Data can be modified during replication, making it customizable.
- Replication to Other Systems: Replicate data to other databases or applications, enabling cross-system data sharing.
Real-World Use Case 1: Toll Collection Management Software
Our toll management software, Scott, is used at the Kalna Bridge toll plaza, where we operate 8 lanes. Each lane has its own local database, and every booth operates independently. Toll transaction data from each booth is pushed to the central server in real-time using a message queue system.
Each booth maintains its own records, including:
- Users: User management.
- Lane Details: Information specific to each lane.
- Shift Management: Tracking which operator is assigned to which lane during specific shifts.
- Calendar System: Managing operator assignments and schedules.
- ETC Mapping: Indicating whether a booth or lane is ETC (Electronic Toll Collection) enabled.
These tables are shared across all lanes, and the admin can update entries from the central server. The challenge arises when some entries are modified, and it’s necessary to keep the data in sync between all 8 booth databases and the central server database.
Why Not Use a Central Database for All Booths?
Each booth needs to operate independently in case the central server goes down or the connection becomes unstable due to natural disasters or other issues. If the central server or connection fails, the booths would become unusable, but the booths must continue to operate to pass vehicles.
What is Logical Replication in This Use Case?
In this context, PostgreSQL logical replication can be used to replicate relevant tables from the central database to each booth’s database. By doing so, updates made in the central database will automatically synchronize to the corresponding tables in all booth databases. This allows the booths to function autonomously while ensuring the data remains consistent across all systems.
Real-World Use Case 2: E-commerce Platforms
Use Case: An e-commerce platform uses logical replication to keep the data in sync between the main database (handling customer transactions) and a separate reporting database (used for analytics). This ensures that the system can generate real-time reports and insights without affecting the performance of the website or app.
Why Use Logical Replication:
- Better Performance: Offloading reporting tasks to a replica keeps the primary database fast for customer-facing transactions.
- Real-time Insights: Business teams receive up-to-date data for decision-making, such as adjusting inventory or running promotions.
- Scalability: Easily add more reporting systems or tools without affecting the main database.
- Data Accuracy: Replication ensures that all systems have consistent and accurate data.
- High Availability: If the main database goes down, the replica can still provide read-only data, minimizing downtime.
- Customizable Data Sync: Replicate only the data that’s needed, making reporting more efficient.
Setting Up PostgreSQL Logical Replication
In this guide, we’ll walk through the steps to set up PostgreSQL logical replication between two servers: a primary (publisher) and a replica (subscriber). This setup allows the replica server to receive real-time data changes from the primary server.
1. Install PostgreSQL
First, install PostgreSQL on both the primary and replica servers. On Ubuntu, you can install it by running the following commands:
sudo apt update
sudo apt install postgresql postgresql-contrib
Repeat this process on both the primary and replica machines.
2. Configure the Primary Database (Publisher)
Edit postgresql.conf
On the primary server, you need to enable logical replication. Open the postgresql.conf
file and update the following settings:
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10
These settings ensure that PostgreSQL can send changes to the replica server.
Edit pg_hba.conf
Next, configure pg_hba.conf
to allow the replica server to connect to the primary for replication. Add the following line:
host replication postgres <replica_db_ip>/32 md5
Replace <replica_ip>
with the actual IP address of the replica server.
Restart PostgreSQL
After making the changes, restart PostgreSQL on the primary server:
sudo systemctl restart postgresql
3. Configure the Replica Database (Subscriber)
Edit postgresql.conf
On the replica server, you need to enable hot standby mode. Open the postgresql.conf
file and add:
hot_standby = on
This setting allows the replica server to accept read queries while receiving replication data.
Edit pg_hba.conf
In the pg_hba.conf
file on the replica, allow the primary server to connect for replication:
host replication postgres <primary_ip>/32 md5
Replace <primary_ip>
with the actual IP address of the primary server.
Restart PostgreSQL
After updating the configuration, restart PostgreSQL on the replica:
sudo systemctl restart postgresql
4. Set Up Logical Replication with Docker (Optional)
If you are using Docker, you can configure both the primary and replica databases using docker-compose
.
Here's a simple docker-compose.yml
configuration:
services:
primary-db:
image: postgres:16-bullseye
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: 12345
POSTGRES_DB: primary_db
ports:
- "5442:5432"
volumes:
- ./primary-db-data:/var/lib/postgresql/data
restart: always
command: [
"postgres",
"-c", "wal_level=logical",
"-c", "max_replication_slots=10",
"-c", "max_wal_senders=10"
]
replica-db:
image: postgres:16-bullseye
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: 12345
POSTGRES_DB: replica_db
ports:
- "5443:5432"
volumes:
- ./replica-db-data:/var/lib/postgresql/data
restart: always
5. Create a Publication on the Primary Database
On the primary database, you need to create a publication to define which tables will be replicated. For example, if you want to replicate the users
table, use this SQL command:
-- Connect to the primary database
\c primary_db
-- Create a publication for the `users` table or multiple table vai comma sepa.
CREATE PUBLICATION central_users_table_pub FOR TABLE users;
6. Create a Subscription on the Replica Database
On the replica database, you need to create a subscription that connects to the primary database and starts receiving changes. Use this SQL command:
-- Connect to the replica database
\c replica_db
-- Create a subscription to the primary database
CREATE SUBSCRIPTION central_users_table_sub
CONNECTION 'host=<primary_ip> port=5442 dbname=primary_db user=postgres password=12345'
PUBLICATION central_users_table_pub;
Replace <primary_ip>
with the IP address of your primary database.
Once the subscription is created, the replica will start receiving real-time updates from the primary.
Summary
PostgreSQL logical replication is an essential tool for modern data-driven applications, offering flexibility, performance, and high availability. By selectively replicating tables and rows, logical replication addresses unique challenges in use cases like toll booth management and e-commerce analytics. This guide has provided a practical walkthrough for setting up logical replication, ensuring reliable and consistent data synchronization across systems.
Acknowledgment: This content from my personal experience and ai.
Top comments (0)