This article demonstrates in detail how to fully synchronize data from*PostgreSQL 14.6toApache Doris 3.0.3usingApache SeaTunnel 2.3.9*. It covers the complete end-to-end process — from environment setup to production validation. Your feedback is welcome!

Environment Requirements

  • PostgreSQL:Server 14.6
  • Apache SeaTunnel:Apache-SeaTunnel-2.3.9
  • Apache Doris:Apache-Doris-3.0.3

Deploying Apache Doris

For a basic Doris deployment, set up one Master and two Node machines. Ensure each node has synchronized time and increased file descriptor limits:

vi /etc/security/limits.conf 
* soft nofile 1000000
* hard nofile 1000000

JAVA Version Selection

  • For versions before 2.1 (inclusive), use Java 8 (recommended: jdk-8u352 or later).
  • For versions 3.0 and later, use Java 17 (recommended: jdk-17.0.10 or later).

Disable Swap

swapoff -a    \# Temporarily disable swap
\# To disable swap permanently, comment out swap entries in /etc/fstab

Installation Package and Steps

Download and extract the Doris package, then deploy the Frontend (FE) cluster:

tar -xf apache-doris-3.0.3.bin.tar.gz

Each FE node can use the same configuration if the JAVA_HOME paths are identical. For example, inapache-doris-3.0.3/fe/conf/fe.conf:

cat apache-doris-3.0.3/fe/conf/fe.conf
...
\## modify case sensitivity
lower\_case\_table_names = 1
\## modify Java Home
JAVA_HOME = 
...

Start the Master Node

bin/start_fe.sh --daemon

Log in using a MySQL client to add a FE follower node:

mysql -uroot -P -h
ALTER SYSTEM ADD FOLLOWER ":"

Then, start the FE Follower node with the master IP and port specified

bin/start\_fe.sh --helper : --daemon

Verify FE status:

SHOW FRONTENDS\\G;
\# "IsMaster: true" indicates the master node.

Deploying the BE Cluster

On all nodes, use a common BE configuration. For example, inbe/conf/be.conf:

cat be/conf/be.conf
...
storage\_root\_path=/home/data1;/home/data2;/home/data3
JAVA_HOME = 
mem_limit = 4G
...

mkdir /home/data1
mkdir /home/data2
mkdir /home/data3

Register the BE node in Doris:

\-\- Connect to an active FE node:
mysql -uroot -P -h

\-\- Register BE node:
ALTER SYSTEM ADD BACKEND ":"

Start the BE process

bin/start_be.sh --daemon

Check BE status:

#\# connect a alive FE node
mysql -uroot -P -h

#\# check BE node status
show backends\\G;

Verify Cluster Functionality

Connect to an active FE node:

#\# connect a alive fe node
mysql -uroot -P -h

Change the Doris cluster password:

\-\- check the current user
select user();  
+\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\+ 
| user()                 |  
+\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\+ 
| 'root'@'192.168.88.30' |  
+\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\+ 

\-\- modify the password for current user
SET PASSWORD = PASSWORD('doris\_new\_passwd');

Create a test database and table, then insert data:

\-\- create a test database
create database testdb;

\-\- create a test table
CREATE TABLE testdb.table_hash
(
    k1 TINYINT,
    k2 DECIMAL(10, 2) DEFAULT "10.5",
    k3 VARCHAR(10) COMMENT "string column",
    k4 INT NOT NULL DEFAULT "1" COMMENT "int column"
)
COMMENT "my first table"
DISTRIBUTED BY HASH(k1) BUCKETS 32;

Doris is compatible with the MySQL protocol and can insert data using the INSERT statement.

\-\- insert data
INSERT INTO testdb.table_hash VALUES
(1, 10.1, 'AAA', 10),
(2, 10.2, 'BBB', 20),
(3, 10.3, 'CCC', 30),
(4, 10.4, 'DDD', 40),
(5, 10.5, 'EEE', 50);

\-\- check the data
SELECT * from testdb.table_hash;
+------+-------+------+------+
| k1   | k2    | k3   | k4   |
+------+-------+------+------+
|    3 | 10.30 | CCC  |   30 |
|    4 | 10.40 | DDD  |   40 |
|    5 | 10.50 | EEE  |   50 |
|    1 | 10.10 | AAA  |   10 |
|    2 | 10.20 | BBB  |   20 |
+------+-------+------+------+

Deploying PostgreSQL

For PostgreSQL deployment details, please refer to the official documentation.

Deploying SeaTunnel

Community Website:seatunnel.apache.org

Download Package:Download Apache SeaTunnel

Extract the packag

tar -xzvf "apache-seatunnel-${version}-bin.tar.gz"

Manually download the connectors and place them in theconnectors/directory (for versions before 2.3.5, place them inconnectors/seatunnel/).

Connector Download URL:Maven Repository

Ensure that the JDBC driver JAR is placed in the${SEATUNNEL_HOME}/lib/directory. If you encounter any issues during deployment, please join our community discussions for help.

Quick Start with the SeaTunnel Engine

Creating a PostgreSQL Table

CREATE TABLE alarm\_receive\_record (
    id BIGINT PRIMARY KEY,
    device_id VARCHAR(50),
    alarm_time TIMESTAMP,
    content TEXT
);

Inserting Data into PostgreSQL

INSERT INTO alarm\_receive\_record (id, device\_id, alarm\_time, content)
VALUES 
    (3, 'D003', '2025-02-26 16:15:00', 'Communication Disruption'),
    (4, 'D004', '2025-02-26 17:20:00', 'Humidity Exceeded'),
    (5, 'D005', '2025-02-26 17:20:00', 'Humidity Exceeded Again'),
    (6, 'D006', '2025-02-26 17:20:00', 'Humidity Exceeded'),
    (7, 'D007', '2025-02-26 17:20:00', 'Humidity Exceeded'),
    (8, 'D008', '2025-02-26 17:20:00', 'Humidity Exceeded'),
    (9, 'D009', '2025-02-26 17:20:00', 'Humidity Exceeded'),
    (10, 'D0010', '2025-02-26 17:20:00', 'Humidity Exceeded');

Manually Creating the Table in Doris

CREATE TABLE testdb.alarm\_receive\_record (
    id BIGINT COMMENT "Unique ID",
    device_id VARCHAR(50) COMMENT "Device ID",
    alarm_time DATETIME COMMENT "Alarm Time",
    content STRING COMMENT "Alarm Content"
) ENGINE=OLAP
DUPLICATE KEY(id, device_id)  \-\- Use DUPLICATE model (detailed storage)
DISTRIBUTED BY HASH(device_id) BUCKETS 10  \-\- Hash partition by device_id
PROPERTIES (
    "replication_num" = "3",  \-\- Number of replicas (consistent with cluster config)
    "storage_format" = "V2"    \-\- Storage format
);

SeaTunnel Configuration for Data Sync

Create a configuration file (pg-doris.yaml) with the following content:

env {
  parallelism = 4
  job.mode = "BATCH"
}
source{
    jdbc{
        url = "jdbc:postgresql://192.168.10.17:5432/aaa"
        driver = "org.postgresql.Driver"
        user = "test"
        password = "123123"
        query = "select * from alarm\_receive\_record"
    }
}

sink {
  Doris {
    fenodes = "192.168.20.174:8030"
    username = "root"
    password = "123123"
    database = "testdb"
    table = "alarm\_receive\_record"
    schema\_save\_mode = "CREATE\_SCHEMA\_WHEN\_NOT\_EXIST"
    sink.label-prefix = "pg\_to\_doris"
    sink.enable-2pc = "true"
    column = \["id", "device_id", "alarm_time", "content"\]
    doris.config {
      format = "json"
      read\_json\_by_line = "true"
    }
  }
}

Run the synchronization task:

cd "apache-seatunnel-${version}"
./bin/seatunnel.sh --config ./config/pg-doris.yaml  -m local

Note: Running the task multiple times will cause duplicate data.

Verify the synchronization in Doris

Conclusion

Choosing the right tool is more important than working hard in the big data era. Let SeaTunnel be your superhighway for data flow, and let Doris serve as the intelligent eye for real-time insights. Together, build a future-ready, data-driven ecosystem!