Official Apache SeaTunnel website: https://seatunnel.apache.org/
Apache SeaTunnel is a next-generation, high-performance, distributed data integration and synchronization tool. Gaining increasing attention across the industry, SeaTunnel supports three deployment modes: Local, Hybrid Cluster Mode, and Separated Cluster Mode.
This article focuses on how to deploy SeaTunnel in Separated Cluster Mode on Kubernetes, providing a complete deployment guide and configuration reference for those with related needs.
Prerequisites
Before getting started with the deployment, ensure the following components are prepared and ready:
- A Kubernetes cluster environment
-
kubectl
command-line tool - Docker
- (Optional) Helm
If you’re already familiar with Helm and prefer using it, SeaTunnel provides official Helm deployment guides:
In this tutorial, we’ll focus on using a pure Kubernetes + kubectl
approach.
Building the SeaTunnel Docker Image
SeaTunnel offers pre-built Docker images for each release version. You can pull them directly using Docker:
docker pull apache/seatunnel:
For detailed instructions, refer to the Set Up With Docker documentation.
Since we are deploying in cluster mode, we need to configure network communication between cluster nodes. SeaTunnel uses Hazelcast to enable inter-node discovery and communication.
Configuring Hazelcast for the Cluster
Headless Service Setup
Hazelcast clusters rely on discovery mechanisms that help member nodes find each other. These include:
- Auto-discovery (AWS, Azure, GCP, Kubernetes)
- TCP
- Multicast
- Eureka
- Zookeeper
For our Kubernetes deployment, we’ll use Hazelcast’s Kubernetes auto-discovery mechanism, specifically the DNS Lookup Mode, which depends on Kubernetes Headless Services.
Headless Services return the IP addresses of all matching Pods instead of a single cluster IP. This allows Hazelcast nodes to find and connect to each other dynamically.
Here’s the YAML configuration to define a Headless Service:
# For Hazelcast cluster join
apiVersion: v1
kind: Service
metadata:
name: seatunnel-cluster
spec:
type: ClusterIP
clusterIP: None
selector:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
ports:
- port: 5801 name: hazelcast
Key Configurations:
-
metadata.name
: The service name used by Hazelcast for DNS-based discovery. -
clusterIP: None
: Defines this as a Headless Service. -
selector
: Matches Pods with specific labels. -
port
: The exposed Hazelcast communication port.
We also define another service for external access to the SeaTunnel master node via REST API:
# For REST API access to SeaTunnel master
apiVersion: v1
kind: Service
metadata:
name: seatunnel-cluster-master
spec:
type: ClusterIP
clusterIP: None
selector:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-master
app.kubernetes.io/component: master
ports:
- port: 8080 name: "master-port" targetPort: 8080 protocol: TCP
Once the services are defined, we configure Hazelcast’s discovery logic in two files: hazelcast-master.yaml
and hazelcast-worker.yaml
.
Hazelcast Master and Worker Configuration
These YAML files define cluster-level networking and discovery for SeaTunnel.
hazelcast-master.yaml
hazelcast:
cluster-name: seatunnel-cluster
network:
rest-api:
enabled: true
endpoint-groups:
CLUSTER_WRITE:
enabled: true
DATA:
enabled: true
join:
kubernetes:
enabled: true
service-dns: seatunnel-cluster.bigdata.svc.cluster.local
service-port: 5801
port:
auto-increment: false
port: 5801
properties:
hazelcast.invocation.max.retry.count: 20
hazelcast.tcp.join.port.try.count: 30
hazelcast.logging.type: log4j2
hazelcast.operation.generic.thread.count: 50
hazelcast.heartbeat.failuredetector.type: phi-accrual
hazelcast.heartbeat.interval.seconds: 30
hazelcast.max.no.heartbeat.seconds: 300
hazelcast.heartbeat.phiaccrual.failuredetector.threshold: 15
hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: 200
hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: 200
Highlights:
-
cluster-name
: Ensures only nodes with the same cluster name can form a cluster. -
rest-api.enabled
: Required to enable REST services in SeaTunnel 2.3.10. -
service-dns
: The full DNS name of the headless service (
).. .svc.cluster.local -
port
: TCP communication port for Hazelcast.
Using the above k8s-based join mechanism, when the Hazelcast Pod is started, it will parse the service-dns, obtain the IP list of all member pods (through the Headless Service), and then try to establish a TCP connection between the members through port 5801.
hazelcast-worker.yaml
hazelcast:
cluster-name: seatunnel-cluster
network:
rest-api:
enabled: true
endpoint-groups:
CLUSTER_WRITE:
enabled: true
DATA:
enabled: true
join:
kubernetes:
enabled: true
service-dns: seatunnel-cluster.bigdata.svc.cluster.local
service-port: 5801
port:
auto-increment: false
port: 5801
properties:
hazelcast.invocation.max.retry.count: 20
hazelcast.tcp.join.port.try.count: 30
hazelcast.logging.type: log4j2
hazelcast.operation.generic.thread.count: 50
hazelcast.heartbeat.failuredetector.type: phi-accrual
hazelcast.heartbeat.interval.seconds: 30
hazelcast.max.no.heartbeat.seconds: 300
hazelcast.heartbeat.phiaccrual.failuredetector.threshold: 15
hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: 200
hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: 200
member-attributes:
rule:
type: string
value: worker
These configurations enable seamless auto-discovery and communication between SeaTunnel master and worker nodes using Hazelcast over Kubernetes.
Configuring the SeaTunnel Engine
The core engine configuration for SeaTunnel resides in the seatunnel.yaml
file. Here’s a sample setup:
seatunnel:
engine:
history-job-expire-minutes: 1440
backup-count: 1
queue-type: blockingqueue
print-execution-info-interval: 60
print-job-metrics-info-interval: 60
classloader-cache-mode: true
http:
enable-http: true
port: 8080
enable-dynamic-port: false
port-range: 100
slot-service:
dynamic-slot: true
checkpoint:
interval: 300000
timeout: 60000
storage:
type: hdfs
max-retained: 3
plugin-config:
namespace: /tmp/seatunnel/checkpoint_snapshot
storage.type: hdfs
fs.defaultFS: hdfs://xxx:8020 # Ensure that the directory has write permission
telemetry:
metric:
enabled: true
Main Configurations Include:
- history-job-expire-minutes: The task history is retained for 24 hours (1440 minutes) and automatically cleared when it expires.
- backup-count: 1: The number of backup copies of the task status is 1.
- Queue-type: blockingqueue: Blocking queues manage tasks to avoid resource exhaustion.
- print-execution-info-interval: 60: Print the task execution status once a minute.
- print-job-metrics-info-interval: 60: Output task metrics (such as throughput, latency) once a minute.
- classloader-cache-mode: true: Enable class loading cache to reduce repeated loading overhead and improve performance.
- dynamic-slot: true: Allows the number of task slots to be dynamically adjusted according to the load to optimize resource utilization.
- checkpoint.interval: 300000: Trigger a checkpoint every 5 minutes.
- checkpoint.timeout: 60000: The checkpoint timeout is 1 minute.
- telemetry.metric.enabled: true: Enables the collection of task running metrics (such as latency and throughput) for easy monitoring.
Creating Kubernetes YAML Files to Deploy the Application
After completing the previous setup steps, we now proceed to the final phase: defining and deploying the master
and worker
nodes using Kubernetes YAML files.
To decouple configuration files from the application logic, we combine the previously mentioned configuration files into a single ConfigMap. This ConfigMap is mounted into the container’s configuration directory, making it easier to centrally manage and update the configurations.
Below are examples for the seatunnel-cluster-master.yaml
and seatunnel-cluster-worker.yaml
files, covering details like mounting the ConfigMap
, container startup commands, and deployment specifications.
seatunnel-cluster-master.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: seatunnel-cluster-master
spec:
replicas: 2 # Modify based on your setup
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 50%
selector:
matchLabels:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-master
app.kubernetes.io/component: master
template:
metadata:
annotations:
prometheus.io/path: /hazelcast/rest/instance/metrics
prometheus.io/port: "5801"
prometheus.io/scrape: "true"
prometheus.io/role: "seatunnel-master"
labels:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-master
app.kubernetes.io/component: master
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodeAffinity-key
operator: Exists
containers:
- name: seatunnel-master
image: seatunnel:2.3.10
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5801
name: hazelcast
- containerPort: 8080
name: "master-port"
command:
- /opt/seatunnel/bin/seatunnel-cluster.sh
- -r
- master
resources:
requests:
cpu: "1"
memory: 4G
volumeMounts:
- mountPath: "/opt/seatunnel/config/hazelcast-master.yaml"
name: seatunnel-configs
subPath: hazelcast-master.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-worker.yaml"
name: seatunnel-configs
subPath: hazelcast-worker.yaml
- mountPath: "/opt/seatunnel/config/seatunnel.yaml"
name: seatunnel-configs
subPath: seatunnel.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-client.yaml"
name: seatunnel-configs
subPath: hazelcast-client.yaml
- mountPath: "/opt/seatunnel/config/log4j2_client.properties"
name: seatunnel-configs
subPath: log4j2_client.properties
- mountPath: "/opt/seatunnel/config/log4j2.properties"
name: seatunnel-configs
subPath: log4j2.properties
volumes:
- name: seatunnel-configs
configMap:
name: seatunnel-cluster-configs
Deployment Strategy
- Use multiple replicas (
replicas = 2
) to ensure high availability. - Apply a RollingUpdate strategy for zero-downtime deployments:
maxUnavailable: 25%
: At least 75% of pods remain available during updates.
maxSurge: 50%
: Allows temporary over-provisioning for a smoother update.
Label Selectors
- Use Kubernetes’ recommended label conventions
-
spec.selector.matchLabels
: Controls which Pods are managed by the Deployment. -
spec.template.labels
: Assigns metadata to the created Pods.
Node Affinity
- Use the
affinity
property to control which nodes the Pods are scheduled on. Replace the placeholder values according to your actual cluster node labels.
Config File Mounts
- Centralize configuration management using a
ConfigMap
- Mount individual configuration files using the
subPath
option
seatunnel-cluster-worker.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: seatunnel-cluster-worker
spec:
replicas: 3 # Modify based on your setup
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 50%
selector:
matchLabels:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-worker
app.kubernetes.io/component: worker
template:
metadata:
annotations:
prometheus.io/path: /hazelcast/rest/instance/metrics
prometheus.io/port: "5801"
prometheus.io/scrape: "true"
prometheus.io/role: "seatunnel-worker"
labels:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-worker
app.kubernetes.io/component: worker
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodeAffinity-key
operator: Exists
containers:
- name: seatunnel-worker
image: seatunnel:2.3.10
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5801
name: hazelcast
command:
- /opt/seatunnel/bin/seatunnel-cluster.sh
- -r
- worker
resources:
requests:
cpu: "1"
memory: 10G
volumeMounts:
- mountPath: "/opt/seatunnel/config/hazelcast-master.yaml"
name: seatunnel-configs
subPath: hazelcast-master.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-worker.yaml"
name: seatunnel-configs
subPath: hazelcast-worker.yaml
- mountPath: "/opt/seatunnel/config/seatunnel.yaml"
name: seatunnel-configs
subPath: seatunnel.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-client.yaml"
name: seatunnel-configs
subPath: hazelcast-client.yaml
- mountPath: "/opt/seatunnel/config/log4j2_client.properties"
name: seatunnel-configs
subPath: log4j2_client.properties
- mountPath: "/opt/seatunnel/config/log4j2.properties"
name: seatunnel-configs
subPath: log4j2.properties
volumes:
- name: seatunnel-configs
configMap:
name: seatunnel-cluster-configs
Once the master
and worker
YAML files are ready, you can deploy them to your Kubernetes cluster using the following commands:
kubectl apply -f seatunnel-cluster-master.yaml
kubectl apply -f seatunnel-cluster-worker.yaml
If everything is set up correctly, you should see 2 master
nodes and 3 worker
nodes running in your SeaTunnel cluster:
$ kubectl get pods | grep seatunnel-cluster
seatunnel-cluster-master-6989898f66-6fjz8 1/1 Running 0 156m
seatunnel-cluster-master-6989898f66-hbtdn 1/1 Running 0 155m
seatunnel-cluster-worker-87fb469f7-5c96x 1/1 Running 0 156m
seatunnel-cluster-worker-87fb469f7-7kt2h 1/1 Running 0 155m
seatunnel-cluster-worker-87fb469f7-drm9r 1/1 Running 0 156m
At this point, we have successfully deployed a SeaTunnel cluster in “cluster-separated mode” on Kubernetes.
Now that the cluster is ready, how do we submit jobs from the client?
Submitting Jobs to the Cluster from the Client
Submit Jobs via Command-Line Tool
All SeaTunnel client configurations are defined in the hazelcast-client.yaml
file.
First, download the binary package locally on the client machine (which includes the bin
and config
directories). Make sure that the SEATUNNEL_HOME
path on the client is the same as that on the server. This is explicitly stated in the documentation: “Setting the SEATUNNEL_HOME
the same as the server.”
If the paths don’t match, issues may arise, such as being unable to locate plugin paths on the server side due to mismatched directories.
Once inside the installation directory, simply edit the config/hazelcast-client.yaml
file to point to the Headless Service
we created earlier:
hazelcast-client:
cluster-name: seatunnel-cluster
properties:
hazelcast.logging.type: log4j2
connection-strategy:
connection-retry:
cluster-connect-timeout-millis: 3000
network:
cluster-members:
- seatunnel-cluster.bigdata.svc.cluster.local:5801
Once the client is configured, tasks can be submitted to the cluster for execution. There are two primary ways to configure JVM parameters during task submission:
- Configure JVM parameters in the
**config/jvm_client_options**
file
This method applies the configured JVM parameters to all tasks submitted via seatunnel.sh
, whether running in local or cluster mode. All submitted tasks will share the same JVM configuration.
- Specify JVM parameters in the task submission command line
When using seatunnel.sh
to submit tasks, you can directly specify JVM parameters in the command line. For example:sh bin/seatunnel.sh --config $SEATUNNEL_HOME/config/v2.batch.config.template -DJvmOption=-Xms2G -Xmx2G
This approach allows setting custom JVM parameters for each task submission individually.
Example: Submitting a Task to the Cluster
The following example demonstrates the complete process of submitting a task from the client to the cluster:
env {
parallelism=2
job.mode="STREAMING"
checkpoint.interval=2000
}
source{
FakeSource{
parallelism=2
plugin_output="fake"
row.num=16
schema={
fields{
name="string"
age="int"
}
}
}
}
sink{
Console{
}
}
Submit the task using the command:
sh bin/seatunnel.sh --config config/v2.streaming.example.template -m cluster -n st.example.template -DJvmOption="-Xms2G -Xmx2G"
On the Master node, use the following command to list the running jobs:
$ shbin/seatunnel.sh-l
JobID JobName JobStatusSubmitTime FinishedTime
---------------------------------------------------------------------------------------------
964354250769432580st.example.templateRUNNING 2025-04-1510:39:30.588
As shown above, the st.example.template
task we just submitted is already in a RUNNING state. You can view the logs on the Worker node, which should look like this:
2025-04-15 10:34:41,998 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=1: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : bdaUB, 110348049
2025-04-15 10:34:41,998 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=1 rowIndex=1: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : mOifY, 1974539087
2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=2: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : jKFrR, 1828047742
2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=1 rowIndex=2: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : gDiqR, 1177544796
2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=3: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : bCVxc, 909343602
...
This confirms that the task has been successfully submitted to the SeaTunnel cluster and is running properly.
Submitting Jobs Using the Rest API
SeaTunnel provides a REST API interface for querying job status and statistics, as well as submitting and stopping jobs.
Previously, we configured a Headless Service for the Master node and exposed port 8080
. This allows us to submit jobs from the client using the REST API.
The SeaTunnel REST API supports job submission via file upload. Example command:
$ curl 'http://seatunnel-cluster-master.bigdata.svc.cluster.local:8080/submit-job/upload' \
--form 'config_file=@"/opt/seatunnel/config/v2.streaming.example.template"' \
--form 'jobName=st.example.template'
{"jobId":"964553575034257409","jobName":"st.example.template"}
If the submission is successful, it returns a jobId
and jobName
as shown.
Next, retrieve the list of currently running jobs in the cluster via the REST API:
curl 'http://seatunnel-cluster-master.bigdata.svc.colo.gzgalocal:8080/running-jobs'
[{"jobId":"964553575034257409","jobName":"st.example.template","jobStatus":"RUNNING","envOptions":{"job.mode":"STREAMING","checkpoint.interval":"2000","parallelism":"2"}, ...]
As shown in the response, the job is running and additional metadata is also returned. This confirms that job submission via the REST API works as expected. For more details, refer to the official documentation: RESTful API V2
Summary
This article focuses on the recommended Separated Cluster Mode for deploying a SeaTunnel cluster on Kubernetes. The deployment process mainly includes the following steps:
Prepare the Kubernetes Environment
Ensure a fully functional Kubernetes cluster is set up and all required components are installed.
Build the SeaTunnel Docker Image
If no secondary development is needed, use the official image. Otherwise, compile and package locally, write a Dockerfile, and build your custom SeaTunnel image.
Configure Headless Service and Hazelcast Cluster
Hazelcast uses Kubernetes’ Headless Service for DNS-based discovery. First, create a Headless Service, then specify its DNS name in the Hazelcast YAML configuration using service-dns
.
The Headless Service will resolve to the IP addresses of the associated pods, enabling inter-member discovery within the Hazelcast cluster.
Configure the SeaTunnel Engine
Modify the seatunnel.yaml
file to set the SeaTunnel engine parameters.
Create Kubernetes YAML Deployment Files
Create separate YAML files for the Master and Worker, specifying node selectors, startup commands, resource requests/limits, and volume mounts. Deploy them to the Kubernetes cluster.
Configure the SeaTunnel Client
Install SeaTunnel on the client machine and ensure the installation path (SEATUNNEL_HOME
) matches the cluster. Modify hazelcast-client.yaml
to point to the Service address of the cluster.
Submit and Run Tasks
After completing the above steps, tasks can be submitted from the client and executed by the SeaTunnel cluster.
The configuration examples provided in this article are for reference only. There may be additional configuration options and best practices not covered here. Contributions and discussion are welcome. We hope this guide proves helpful!