DEV Community

Apache SeaTunnel
Apache SeaTunnel

Posted on

Full Data Synchronization from PostgreSQL 14.6 to Apache Doris 3.0.3 Using Apache SeaTunnel 2.3.9: A Step-by-Step Guide

This article demonstrates in detail how to fully synchronize data from*PostgreSQL 14.6toApache Doris 3.0.3usingApache SeaTunnel 2.3.9*. It covers the complete end-to-end process — from environment setup to production validation. Your feedback is welcome!

Environment Requirements

  • PostgreSQL:Server 14.6
  • Apache SeaTunnel:Apache-SeaTunnel-2.3.9
  • Apache Doris:Apache-Doris-3.0.3

Deploying Apache Doris

For a basic Doris deployment, set up one Master and two Node machines. Ensure each node has synchronized time and increased file descriptor limits:

vi /etc/security/limits.conf 
* soft nofile 1000000
* hard nofile 1000000
Enter fullscreen mode Exit fullscreen mode

JAVA Version Selection

  • For versions before 2.1 (inclusive), use Java 8 (recommended: jdk-8u352 or later).
  • For versions 3.0 and later, use Java 17 (recommended: jdk-17.0.10 or later).

Disable Swap

swapoff -a    \# Temporarily disable swap
\# To disable swap permanently, comment out swap entries in /etc/fstab
Enter fullscreen mode Exit fullscreen mode

Installation Package and Steps

Download and extract the Doris package, then deploy the Frontend (FE) cluster:

tar -xf apache-doris-3.0.3.bin.tar.gz
Enter fullscreen mode Exit fullscreen mode

Each FE node can use the same configuration if the JAVA_HOME paths are identical. For example, inapache-doris-3.0.3/fe/conf/fe.conf:

cat apache-doris-3.0.3/fe/conf/fe.conf
...
\## modify case sensitivity
lower\_case\_table_names = 1
\## modify Java Home
JAVA_HOME = <your-java-home-path>
...
Enter fullscreen mode Exit fullscreen mode

Start the Master Node

bin/start_fe.sh --daemon
Enter fullscreen mode Exit fullscreen mode

Log in using a MySQL client to add a FE follower node:

mysql -uroot -P<fe\_query\_port> -h<fe\_ip\_address>
ALTER SYSTEM ADD FOLLOWER "<fe\_ip\_address>:<fe\_edit\_log_port>"
Enter fullscreen mode Exit fullscreen mode

Then, start the FE Follower node with the master IP and port specified

bin/start\_fe.sh --helper <helper\_fe_ip>:<fe\_edit\_log_port> --daemon
Enter fullscreen mode Exit fullscreen mode

Verify FE status:

SHOW FRONTENDS\\G;
\# "IsMaster: true" indicates the master node.
Enter fullscreen mode Exit fullscreen mode

Deploying the BE Cluster

On all nodes, use a common BE configuration. For example, inbe/conf/be.conf:

cat be/conf/be.conf
...
storage\_root\_path=/home/data1;/home/data2;/home/data3
JAVA_HOME = <your-java-home-path>
mem_limit = 4G
...

mkdir /home/data1
mkdir /home/data2
mkdir /home/data3
Enter fullscreen mode Exit fullscreen mode

Register the BE node in Doris:

\-\- Connect to an active FE node:
mysql -uroot -P<fe\_query\_port> -h<fe\_ip\_address>

\-\- Register BE node:
ALTER SYSTEM ADD BACKEND "<be\_ip\_address>:<be\_heartbeat\_service_port>"
Enter fullscreen mode Exit fullscreen mode

Start the BE process

bin/start_be.sh --daemon
Enter fullscreen mode Exit fullscreen mode

Check BE status:

 #\# connect a alive FE node
mysql -uroot -P<fe\_query\_port> -h<fe\_ip\_address>

#\# check BE node status
show backends\\G;
Enter fullscreen mode Exit fullscreen mode

Verify Cluster Functionality

Connect to an active FE node:

#\# connect a alive fe node
mysql -uroot -P<fe\_query\_port> -h<fe\_ip\_address>
Enter fullscreen mode Exit fullscreen mode

Change the Doris cluster password:

\-\- check the current user
select user();  
+\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\+ 
| user()                 |  
+\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\+ 
| 'root'@'192.168.88.30' |  
+\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\+ 

\-\- modify the password for current user
SET PASSWORD = PASSWORD('doris\_new\_passwd');
Enter fullscreen mode Exit fullscreen mode

Create a test database and table, then insert data:

\-\- create a test database
create database testdb;

\-\- create a test table
CREATE TABLE testdb.table_hash
(
    k1 TINYINT,
    k2 DECIMAL(10, 2) DEFAULT "10.5",
    k3 VARCHAR(10) COMMENT "string column",
    k4 INT NOT NULL DEFAULT "1" COMMENT "int column"
)
COMMENT "my first table"
DISTRIBUTED BY HASH(k1) BUCKETS 32;
Enter fullscreen mode Exit fullscreen mode

Doris is compatible with the MySQL protocol and can insert data using the INSERT statement.

\-\- insert data
INSERT INTO testdb.table_hash VALUES
(1, 10.1, 'AAA', 10),
(2, 10.2, 'BBB', 20),
(3, 10.3, 'CCC', 30),
(4, 10.4, 'DDD', 40),
(5, 10.5, 'EEE', 50);

\-\- check the data
SELECT * from testdb.table_hash;
+------+-------+------+------+
| k1   | k2    | k3   | k4   |
+------+-------+------+------+
|    3 | 10.30 | CCC  |   30 |
|    4 | 10.40 | DDD  |   40 |
|    5 | 10.50 | EEE  |   50 |
|    1 | 10.10 | AAA  |   10 |
|    2 | 10.20 | BBB  |   20 |
+------+-------+------+------+
Enter fullscreen mode Exit fullscreen mode

Deploying PostgreSQL

For PostgreSQL deployment details, please refer to the official documentation.

Deploying SeaTunnel

Community Website:seatunnel.apache.org

Download Package:Download Apache SeaTunnel

Extract the packag

tar -xzvf "apache-seatunnel-${version}-bin.tar.gz"
Enter fullscreen mode Exit fullscreen mode

Manually download the connectors and place them in theconnectors/directory (for versions before 2.3.5, place them inconnectors/seatunnel/).

Connector Download URL:Maven Repository

Ensure that the JDBC driver JAR is placed in the${SEATUNNEL_HOME}/lib/directory. If you encounter any issues during deployment, please join our community discussions for help.

Quick Start with the SeaTunnel Engine

Creating a PostgreSQL Table

CREATE TABLE alarm\_receive\_record (
    id BIGINT PRIMARY KEY,
    device_id VARCHAR(50),
    alarm_time TIMESTAMP,
    content TEXT
);
Enter fullscreen mode Exit fullscreen mode

Inserting Data into PostgreSQL

INSERT INTO alarm\_receive\_record (id, device\_id, alarm\_time, content)
VALUES 
    (3, 'D003', '2025-02-26 16:15:00', 'Communication Disruption'),
    (4, 'D004', '2025-02-26 17:20:00', 'Humidity Exceeded'),
    (5, 'D005', '2025-02-26 17:20:00', 'Humidity Exceeded Again'),
    (6, 'D006', '2025-02-26 17:20:00', 'Humidity Exceeded'),
    (7, 'D007', '2025-02-26 17:20:00', 'Humidity Exceeded'),
    (8, 'D008', '2025-02-26 17:20:00', 'Humidity Exceeded'),
    (9, 'D009', '2025-02-26 17:20:00', 'Humidity Exceeded'),
    (10, 'D0010', '2025-02-26 17:20:00', 'Humidity Exceeded');
Enter fullscreen mode Exit fullscreen mode

Manually Creating the Table in Doris

CREATE TABLE testdb.alarm\_receive\_record (
    id BIGINT COMMENT "Unique ID",
    device_id VARCHAR(50) COMMENT "Device ID",
    alarm_time DATETIME COMMENT "Alarm Time",
    content STRING COMMENT "Alarm Content"
) ENGINE=OLAP
DUPLICATE KEY(id, device_id)  \-\- Use DUPLICATE model (detailed storage)
DISTRIBUTED BY HASH(device_id) BUCKETS 10  \-\- Hash partition by device_id
PROPERTIES (
    "replication_num" = "3",  \-\- Number of replicas (consistent with cluster config)
    "storage_format" = "V2"    \-\- Storage format
);
Enter fullscreen mode Exit fullscreen mode

SeaTunnel Configuration for Data Sync

Create a configuration file (pg-doris.yaml) with the following content:

env {
  parallelism = 4
  job.mode = "BATCH"
}
source{
    jdbc{
        url = "jdbc:postgresql://192.168.10.17:5432/aaa"
        driver = "org.postgresql.Driver"
        user = "test"
        password = "123123"
        query = "select * from alarm\_receive\_record"
    }
}

sink {
  Doris {
    fenodes = "192.168.20.174:8030"
    username = "root"
    password = "123123"
    database = "testdb"
    table = "alarm\_receive\_record"
    schema\_save\_mode = "CREATE\_SCHEMA\_WHEN\_NOT\_EXIST"
    sink.label-prefix = "pg\_to\_doris"
    sink.enable-2pc = "true"
    column = \["id", "device_id", "alarm_time", "content"\]
    doris.config {
      format = "json"
      read\_json\_by_line = "true"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Run the synchronization task:

cd "apache-seatunnel-${version}"
./bin/seatunnel.sh --config ./config/pg-doris.yaml  -m local
Enter fullscreen mode Exit fullscreen mode

Note: Running the task multiple times will cause duplicate data.

Verify the synchronization in Doris

Conclusion

Choosing the right tool is more important than working hard in the big data era. Let SeaTunnel be your superhighway for data flow, and let Doris serve as the intelligent eye for real-time insights. Together, build a future-ready, data-driven ecosystem!

Top comments (0)