DEV Community

Cover image for How to Set Up PostgreSQL Logical Replication: Use Cases and Step-by-Step Guide.
Rasel Mahmud
Rasel Mahmud

Posted on • Edited on

How to Set Up PostgreSQL Logical Replication: Use Cases and Step-by-Step Guide.

PostgreSQL replication is a powerful mechanism that ensures data consistency and availability across multiple databases. It enables data from one database (the primary) to be synchronized with one or more other databases (the replicas). This process is crucial for applications that require high availability, scalability, or distributed data sharing.

This guide explores the types of PostgreSQL replication, their benefits, and real-world use cases, followed by a step-by-step setup process for logical replication.


What is PostgreSQL Replication?

PostgreSQL replication is a process where data from one database (primary) is copied and kept in sync with one or more other databases (replicas).

There are two main types of replication in PostgreSQL:

  1. Physical Replication: Entire data is replicated at the block level. It’s commonly used for high availability (e.g., streaming replication).
  2. Logical Replication: Replicates changes at the logical level (row-based). It’s more flexible and allows selective replication of data, meaning you can choose specific tables or parts of a database to replicate.

Benefits of Logical Replication

  1. Selective Replication: Replicate only specific tables or rows, improving performance and storage.
  2. Cross-Version Replication: Allows replication between different PostgreSQL versions.
  3. Custom Transformations: Data can be modified during replication, making it customizable.
  4. Replication to Other Systems: Replicate data to other databases or applications, enabling cross-system data sharing.

Real-World Use Case 1: Toll Collection Management Software

Our toll management software, Scott, is used at the Kalna Bridge toll plaza, where we operate 8 lanes. Each lane has its own local database, and every booth operates independently. Toll transaction data from each booth is pushed to the central server in real-time using a message queue system.

Each booth maintains its own records, including:

  • Users: User management.
  • Lane Details: Information specific to each lane.
  • Shift Management: Tracking which operator is assigned to which lane during specific shifts.
  • Calendar System: Managing operator assignments and schedules.
  • ETC Mapping: Indicating whether a booth or lane is ETC (Electronic Toll Collection) enabled.

These tables are shared across all lanes, and the admin can update entries from the central server. The challenge arises when some entries are modified, and it’s necessary to keep the data in sync between all 8 booth databases and the central server database.


Why Not Use a Central Database for All Booths?

Each booth needs to operate independently in case the central server goes down or the connection becomes unstable due to natural disasters or other issues. If the central server or connection fails, the booths would become unusable, but the booths must continue to operate to pass vehicles.


What is Logical Replication in This Use Case?

In this context, PostgreSQL logical replication can be used to replicate relevant tables from the central database to each booth’s database. By doing so, updates made in the central database will automatically synchronize to the corresponding tables in all booth databases. This allows the booths to function autonomously while ensuring the data remains consistent across all systems.


Real-World Use Case 2: E-commerce Platforms

Use Case: An e-commerce platform uses logical replication to keep the data in sync between the main database (handling customer transactions) and a separate reporting database (used for analytics). This ensures that the system can generate real-time reports and insights without affecting the performance of the website or app.

Why Use Logical Replication:

  • Better Performance: Offloading reporting tasks to a replica keeps the primary database fast for customer-facing transactions.
  • Real-time Insights: Business teams receive up-to-date data for decision-making, such as adjusting inventory or running promotions.
  • Scalability: Easily add more reporting systems or tools without affecting the main database.
  • Data Accuracy: Replication ensures that all systems have consistent and accurate data.
  • High Availability: If the main database goes down, the replica can still provide read-only data, minimizing downtime.
  • Customizable Data Sync: Replicate only the data that’s needed, making reporting more efficient.

Setting Up PostgreSQL Logical Replication

In this guide, we’ll walk through the steps to set up PostgreSQL logical replication between two servers: a primary (publisher) and a replica (subscriber). This setup allows the replica server to receive real-time data changes from the primary server.

1. Install PostgreSQL

First, install PostgreSQL on both the primary and replica servers. On Ubuntu, you can install it by running the following commands:

sudo apt update
sudo apt install postgresql postgresql-contrib
Enter fullscreen mode Exit fullscreen mode

Repeat this process on both the primary and replica machines.

2. Configure the Primary Database (Publisher)

Edit postgresql.conf

On the primary server, you need to enable logical replication. Open the postgresql.conf file and update the following settings:

wal_level = logical
max_replication_slots = 10
max_wal_senders = 10
Enter fullscreen mode Exit fullscreen mode

These settings ensure that PostgreSQL can send changes to the replica server.

Edit pg_hba.conf

Next, configure pg_hba.conf to allow the replica server to connect to the primary for replication. Add the following line:

host    replication     postgres     <replica_db_ip>/32        md5
Enter fullscreen mode Exit fullscreen mode

Replace <replica_ip> with the actual IP address of the replica server.

Restart PostgreSQL

After making the changes, restart PostgreSQL on the primary server:

sudo systemctl restart postgresql
Enter fullscreen mode Exit fullscreen mode

3. Configure the Replica Database (Subscriber)

Edit postgresql.conf

On the replica server, you need to enable hot standby mode. Open the postgresql.conf file and add:

hot_standby = on
Enter fullscreen mode Exit fullscreen mode

This setting allows the replica server to accept read queries while receiving replication data.

Edit pg_hba.conf

In the pg_hba.conf file on the replica, allow the primary server to connect for replication:

host    replication     postgres     <primary_ip>/32       md5
Enter fullscreen mode Exit fullscreen mode

Replace <primary_ip> with the actual IP address of the primary server.

Restart PostgreSQL

After updating the configuration, restart PostgreSQL on the replica:

sudo systemctl restart postgresql
Enter fullscreen mode Exit fullscreen mode

4. Set Up Logical Replication with Docker (Optional)

If you are using Docker, you can configure both the primary and replica databases using docker-compose.

Here's a simple docker-compose.yml configuration:

services:
  central_primary_db:
    image: postgres:16-bullseye
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: 12345
      POSTGRES_DB: primary_db
    ports:
      - "5442:5432"
    volumes:
      - ./central_primary_db_data:/var/lib/postgresql/data
    restart: always
    command: [
      "postgres",
      "-c", "wal_level=logical",
      "-c", "max_replication_slots=10",
      "-c", "max_wal_senders=10"
    ]

  kamalapur_station_db:
    image: postgres:16-bullseye
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: 12345
      POSTGRES_DB: kamalapur_station_db
    ports:
      - "5443:5432"
    volumes:
      - ./kamalapur_station_db_data:/var/lib/postgresql/data
    restart: always

    airport_station_db:
     image: postgres:16-bullseye
     environment:
        POSTGRES_USER: postgres
        POSTGRES_PASSWORD: 12345
        POSTGRES_DB: airport_station_db
     ports:
        - "5444:5432"
     volumes:
        - ./airport_station_db_data:/var/lib/postgresql/data
     restart: always  
Enter fullscreen mode Exit fullscreen mode

5. Create a Publication on the Primary Database

On the primary database, you need to create a publication to define which tables will be replicated. For example, if you want to replicate the users table, use this SQL command:

-- Connect to the primary database
\c central_primary_db

-- Create a publication for the `users` table or multiple table vai comma sepa.
CREATE PUBLICATION central_railway_system_pub
   FOR TABLE
      users,                      
      train_schedules,            
      tickets,                    
      payments,                   
      seat_reservations           
   WITH (publish = 'insert, update, delete');
Enter fullscreen mode Exit fullscreen mode

6. Create a Subscription on the Replica Database

On the replica database, you need to create a subscription that connects to the primary database and starts receiving changes. Use this SQL command:

-- Connect to the replica database
\c replica_db

-- Create a subscription to the primary database
CREATE SUBSCRIPTION central_railway_system_sub
   CONNECTION 'host=<primary_ip> port=5432 dbname=primary_db user=replicator_user password=<secure_password> sslmode=require'
   PUBLICATION central_railway_system_pub
   WITH (
      copy_data = true,   -- Copy existing data to replica db.
      create_slot = true,  
      enabled = true       
    );

Enter fullscreen mode Exit fullscreen mode

Replace <primary_ip> with the IP address of your primary database.

Once the subscription is created, the replica will start receiving real-time updates from the primary.

Summary

PostgreSQL logical replication is an essential tool for modern data-driven applications, offering flexibility, performance, and high availability. By selectively replicating tables and rows, logical replication addresses unique challenges in use cases like toll booth management and e-commerce analytics. This guide has provided a practical walkthrough for setting up logical replication, ensuring reliable and consistent data synchronization across systems.

Acknowledgment: This content from my personal experience and ai.

Top comments (0)