This article demonstrates in detail how to fully synchronize data from*PostgreSQL 14.6toApache Doris 3.0.3usingApache SeaTunnel 2.3.9*. It covers the complete end-to-end process — from environment setup to production validation. Your feedback is welcome!
Environment Requirements
- PostgreSQL:Server 14.6
- Apache SeaTunnel:Apache-SeaTunnel-2.3.9
- Apache Doris:Apache-Doris-3.0.3
Deploying Apache Doris
For a basic Doris deployment, set up one Master and two Node machines. Ensure each node has synchronized time and increased file descriptor limits:
vi /etc/security/limits.conf
* soft nofile 1000000
* hard nofile 1000000
JAVA Version Selection
- For versions before 2.1 (inclusive), use Java 8 (recommended: jdk-8u352 or later).
- For versions 3.0 and later, use Java 17 (recommended: jdk-17.0.10 or later).
Disable Swap
swapoff -a \# Temporarily disable swap
\# To disable swap permanently, comment out swap entries in /etc/fstab
Installation Package and Steps
Download and extract the Doris package, then deploy the Frontend (FE) cluster:
tar -xf apache-doris-3.0.3.bin.tar.gz
Each FE node can use the same configuration if the JAVA_HOME paths are identical. For example, inapache-doris-3.0.3/fe/conf/fe.conf
:
cat apache-doris-3.0.3/fe/conf/fe.conf
...
\## modify case sensitivity
lower\_case\_table_names = 1
\## modify Java Home
JAVA_HOME = <your-java-home-path>
...
Start the Master Node
bin/start_fe.sh --daemon
Log in using a MySQL client to add a FE follower node:
mysql -uroot -P<fe\_query\_port> -h<fe\_ip\_address>
ALTER SYSTEM ADD FOLLOWER "<fe\_ip\_address>:<fe\_edit\_log_port>"
Then, start the FE Follower node with the master IP and port specified
bin/start\_fe.sh --helper <helper\_fe_ip>:<fe\_edit\_log_port> --daemon
Verify FE status:
SHOW FRONTENDS\\G;
\# "IsMaster: true" indicates the master node.
Deploying the BE Cluster
On all nodes, use a common BE configuration. For example, inbe/conf/be.conf
:
cat be/conf/be.conf
...
storage\_root\_path=/home/data1;/home/data2;/home/data3
JAVA_HOME = <your-java-home-path>
mem_limit = 4G
...
mkdir /home/data1
mkdir /home/data2
mkdir /home/data3
Register the BE node in Doris:
\-\- Connect to an active FE node:
mysql -uroot -P<fe\_query\_port> -h<fe\_ip\_address>
\-\- Register BE node:
ALTER SYSTEM ADD BACKEND "<be\_ip\_address>:<be\_heartbeat\_service_port>"
Start the BE process
bin/start_be.sh --daemon
Check BE status:
#\# connect a alive FE node
mysql -uroot -P<fe\_query\_port> -h<fe\_ip\_address>
#\# check BE node status
show backends\\G;
Verify Cluster Functionality
Connect to an active FE node:
#\# connect a alive fe node
mysql -uroot -P<fe\_query\_port> -h<fe\_ip\_address>
Change the Doris cluster password:
\-\- check the current user
select user();
+\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\+
| user() |
+\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\+
| 'root'@'192.168.88.30' |
+\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\+
\-\- modify the password for current user
SET PASSWORD = PASSWORD('doris\_new\_passwd');
Create a test database and table, then insert data:
\-\- create a test database
create database testdb;
\-\- create a test table
CREATE TABLE testdb.table_hash
(
k1 TINYINT,
k2 DECIMAL(10, 2) DEFAULT "10.5",
k3 VARCHAR(10) COMMENT "string column",
k4 INT NOT NULL DEFAULT "1" COMMENT "int column"
)
COMMENT "my first table"
DISTRIBUTED BY HASH(k1) BUCKETS 32;
Doris is compatible with the MySQL protocol and can insert data using the INSERT statement.
\-\- insert data
INSERT INTO testdb.table_hash VALUES
(1, 10.1, 'AAA', 10),
(2, 10.2, 'BBB', 20),
(3, 10.3, 'CCC', 30),
(4, 10.4, 'DDD', 40),
(5, 10.5, 'EEE', 50);
\-\- check the data
SELECT * from testdb.table_hash;
+------+-------+------+------+
| k1 | k2 | k3 | k4 |
+------+-------+------+------+
| 3 | 10.30 | CCC | 30 |
| 4 | 10.40 | DDD | 40 |
| 5 | 10.50 | EEE | 50 |
| 1 | 10.10 | AAA | 10 |
| 2 | 10.20 | BBB | 20 |
+------+-------+------+------+
Deploying PostgreSQL
For PostgreSQL deployment details, please refer to the official documentation.
Deploying SeaTunnel
Community Website:seatunnel.apache.org
Download Package:Download Apache SeaTunnel
Extract the packag
tar -xzvf "apache-seatunnel-${version}-bin.tar.gz"
Manually download the connectors and place them in theconnectors/
directory (for versions before 2.3.5, place them inconnectors/seatunnel/
).
Connector Download URL:Maven Repository
Ensure that the JDBC driver JAR is placed in the${SEATUNNEL_HOME}/lib/
directory. If you encounter any issues during deployment, please join our community discussions for help.
Quick Start with the SeaTunnel Engine
Creating a PostgreSQL Table
CREATE TABLE alarm\_receive\_record (
id BIGINT PRIMARY KEY,
device_id VARCHAR(50),
alarm_time TIMESTAMP,
content TEXT
);
Inserting Data into PostgreSQL
INSERT INTO alarm\_receive\_record (id, device\_id, alarm\_time, content)
VALUES
(3, 'D003', '2025-02-26 16:15:00', 'Communication Disruption'),
(4, 'D004', '2025-02-26 17:20:00', 'Humidity Exceeded'),
(5, 'D005', '2025-02-26 17:20:00', 'Humidity Exceeded Again'),
(6, 'D006', '2025-02-26 17:20:00', 'Humidity Exceeded'),
(7, 'D007', '2025-02-26 17:20:00', 'Humidity Exceeded'),
(8, 'D008', '2025-02-26 17:20:00', 'Humidity Exceeded'),
(9, 'D009', '2025-02-26 17:20:00', 'Humidity Exceeded'),
(10, 'D0010', '2025-02-26 17:20:00', 'Humidity Exceeded');
Manually Creating the Table in Doris
CREATE TABLE testdb.alarm\_receive\_record (
id BIGINT COMMENT "Unique ID",
device_id VARCHAR(50) COMMENT "Device ID",
alarm_time DATETIME COMMENT "Alarm Time",
content STRING COMMENT "Alarm Content"
) ENGINE=OLAP
DUPLICATE KEY(id, device_id) \-\- Use DUPLICATE model (detailed storage)
DISTRIBUTED BY HASH(device_id) BUCKETS 10 \-\- Hash partition by device_id
PROPERTIES (
"replication_num" = "3", \-\- Number of replicas (consistent with cluster config)
"storage_format" = "V2" \-\- Storage format
);
SeaTunnel Configuration for Data Sync
Create a configuration file (pg-doris.yaml
) with the following content:
env {
parallelism = 4
job.mode = "BATCH"
}
source{
jdbc{
url = "jdbc:postgresql://192.168.10.17:5432/aaa"
driver = "org.postgresql.Driver"
user = "test"
password = "123123"
query = "select * from alarm\_receive\_record"
}
}
sink {
Doris {
fenodes = "192.168.20.174:8030"
username = "root"
password = "123123"
database = "testdb"
table = "alarm\_receive\_record"
schema\_save\_mode = "CREATE\_SCHEMA\_WHEN\_NOT\_EXIST"
sink.label-prefix = "pg\_to\_doris"
sink.enable-2pc = "true"
column = \["id", "device_id", "alarm_time", "content"\]
doris.config {
format = "json"
read\_json\_by_line = "true"
}
}
}
Run the synchronization task:
cd "apache-seatunnel-${version}"
./bin/seatunnel.sh --config ./config/pg-doris.yaml -m local
Note: Running the task multiple times will cause duplicate data.
Verify the synchronization in Doris
Conclusion
Choosing the right tool is more important than working hard in the big data era. Let SeaTunnel be your superhighway for data flow, and let Doris serve as the intelligent eye for real-time insights. Together, build a future-ready, data-driven ecosystem!
Top comments (0)