Practical experience in deploying K8s clusters in Apache SeaTunnel separated cluster mode

Official Apache SeaTunnel website: https://seatunnel.apache.org/

Apache SeaTunnel is a next-generation, high-performance, distributed data integration and synchronization tool. Gaining increasing attention across the industry, SeaTunnel supports three deployment modes: Local, Hybrid Cluster Mode, and Separated Cluster Mode.

This article focuses on how to deploy SeaTunnel in Separated Cluster Mode on Kubernetes, providing a complete deployment guide and configuration reference for those with related needs.

Prerequisites

Before getting started with the deployment, ensure the following components are prepared and ready:

A Kubernetes cluster environment
kubectl command-line tool
Docker
(Optional) Helm

If you’re already familiar with Helm and prefer using it, SeaTunnel provides official Helm deployment guides:

In this tutorial, we’ll focus on using a pure Kubernetes + kubectl approach.

Building the SeaTunnel Docker Image

SeaTunnel offers pre-built Docker images for each release version. You can pull them directly using Docker:

docker pull apache/seatunnel:

For detailed instructions, refer to the Set Up With Docker documentation.

Since we are deploying in cluster mode, we need to configure network communication between cluster nodes. SeaTunnel uses Hazelcast to enable inter-node discovery and communication.

Configuring Hazelcast for the Cluster

Headless Service Setup

Hazelcast clusters rely on discovery mechanisms that help member nodes find each other. These include:

Auto-discovery (AWS, Azure, GCP, Kubernetes)
TCP
Multicast
Eureka
Zookeeper

For our Kubernetes deployment, we’ll use Hazelcast’s Kubernetes auto-discovery mechanism, specifically the DNS Lookup Mode, which depends on Kubernetes Headless Services.

Headless Services return the IP addresses of all matching Pods instead of a single cluster IP. This allows Hazelcast nodes to find and connect to each other dynamically.

Here’s the YAML configuration to define a Headless Service:

# For Hazelcast cluster join
apiVersion: v1
kind: Service
metadata:
name: seatunnel-cluster
spec:
type: ClusterIP
clusterIP: None
selector:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
ports:

port: 5801 name: hazelcast

Key Configurations:

metadata.name: The service name used by Hazelcast for DNS-based discovery.
clusterIP: None: Defines this as a Headless Service.
selector: Matches Pods with specific labels.
port: The exposed Hazelcast communication port.

We also define another service for external access to the SeaTunnel master node via REST API:

# For REST API access to SeaTunnel master
apiVersion: v1
kind: Service
metadata:
name: seatunnel-cluster-master
spec:
type: ClusterIP
clusterIP: None
selector:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-master
app.kubernetes.io/component: master
ports:

port: 8080 name: "master-port" targetPort: 8080 protocol: TCP

Once the services are defined, we configure Hazelcast’s discovery logic in two files: hazelcast-master.yaml and hazelcast-worker.yaml.

Hazelcast Master and Worker Configuration

These YAML files define cluster-level networking and discovery for SeaTunnel.

`hazelcast-master.yaml`

hazelcast:
cluster-name: seatunnel-cluster
network:
rest-api:
enabled: true
endpoint-groups:
CLUSTER_WRITE:
enabled: true
DATA:
enabled: true
join:
kubernetes:
enabled: true
service-dns: seatunnel-cluster.bigdata.svc.cluster.local
service-port: 5801
port:
auto-increment: false
port: 5801
properties:
hazelcast.invocation.max.retry.count: 20
hazelcast.tcp.join.port.try.count: 30
hazelcast.logging.type: log4j2
hazelcast.operation.generic.thread.count: 50
hazelcast.heartbeat.failuredetector.type: phi-accrual
hazelcast.heartbeat.interval.seconds: 30
hazelcast.max.no.heartbeat.seconds: 300
hazelcast.heartbeat.phiaccrual.failuredetector.threshold: 15
hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: 200
hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: 200

Highlights:

cluster-name: Ensures only nodes with the same cluster name can form a cluster.
rest-api.enabled: Required to enable REST services in SeaTunnel 2.3.10.
service-dns: The full DNS name of the headless service (..svc.cluster.local).
port: TCP communication port for Hazelcast.

Using the above k8s-based join mechanism, when the Hazelcast Pod is started, it will parse the service-dns, obtain the IP list of all member pods (through the Headless Service), and then try to establish a TCP connection between the members through port 5801.

`hazelcast-worker.yaml`

These configurations enable seamless auto-discovery and communication between SeaTunnel master and worker nodes using Hazelcast over Kubernetes.

Configuring the SeaTunnel Engine

The core engine configuration for SeaTunnel resides in the seatunnel.yaml file. Here’s a sample setup:

seatunnel:
engine:
history-job-expire-minutes: 1440
backup-count: 1
queue-type: blockingqueue
print-execution-info-interval: 60
print-job-metrics-info-interval: 60
classloader-cache-mode: true
http:
enable-http: true
port: 8080
enable-dynamic-port: false
port-range: 100
slot-service:
dynamic-slot: true
checkpoint:
interval: 300000
timeout: 60000
storage:
type: hdfs
max-retained: 3
plugin-config:
namespace: /tmp/seatunnel/checkpoint_snapshot
storage.type: hdfs
fs.defaultFS: hdfs://xxx:8020 # Ensure that the directory has write permission
telemetry:
metric:
enabled: true

Main Configurations Include:

history-job-expire-minutes: The task history is retained for 24 hours (1440 minutes) and automatically cleared when it expires.
backup-count: 1: The number of backup copies of the task status is 1.
Queue-type: blockingqueue: Blocking queues manage tasks to avoid resource exhaustion.
print-execution-info-interval: 60: Print the task execution status once a minute.
print-job-metrics-info-interval: 60: Output task metrics (such as throughput, latency) once a minute.
classloader-cache-mode: true: Enable class loading cache to reduce repeated loading overhead and improve performance.
dynamic-slot: true: Allows the number of task slots to be dynamically adjusted according to the load to optimize resource utilization.
checkpoint.interval: 300000: Trigger a checkpoint every 5 minutes.
checkpoint.timeout: 60000: The checkpoint timeout is 1 minute.
telemetry.metric.enabled: true: Enables the collection of task running metrics (such as latency and throughput) for easy monitoring.

Creating Kubernetes YAML Files to Deploy the Application

After completing the previous setup steps, we now proceed to the final phase: defining and deploying the master and worker nodes using Kubernetes YAML files.

To decouple configuration files from the application logic, we combine the previously mentioned configuration files into a single ConfigMap. This ConfigMap is mounted into the container’s configuration directory, making it easier to centrally manage and update the configurations.

Below are examples for the seatunnel-cluster-master.yaml and seatunnel-cluster-worker.yaml files, covering details like mounting the ConfigMap, container startup commands, and deployment specifications.

seatunnel-cluster-master.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
name: seatunnel-cluster-master
spec:
replicas: 2 # Modify based on your setup
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 50%
selector:
matchLabels:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-master
app.kubernetes.io/component: master
template:
metadata:
annotations:
prometheus.io/path: /hazelcast/rest/instance/metrics
prometheus.io/port: "5801"
prometheus.io/scrape: "true"
prometheus.io/role: "seatunnel-master"
labels:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-master
app.kubernetes.io/component: master
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodeAffinity-key
operator: Exists
containers:
- name: seatunnel-master
image: seatunnel:2.3.10
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5801
name: hazelcast
- containerPort: 8080
name: "master-port"
command:
- /opt/seatunnel/bin/seatunnel-cluster.sh
- -r
- master
resources:
requests:
cpu: "1"
memory: 4G
volumeMounts:
- mountPath: "/opt/seatunnel/config/hazelcast-master.yaml"
name: seatunnel-configs
subPath: hazelcast-master.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-worker.yaml"
name: seatunnel-configs
subPath: hazelcast-worker.yaml
- mountPath: "/opt/seatunnel/config/seatunnel.yaml"
name: seatunnel-configs
subPath: seatunnel.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-client.yaml"
name: seatunnel-configs
subPath: hazelcast-client.yaml
- mountPath: "/opt/seatunnel/config/log4j2_client.properties"
name: seatunnel-configs
subPath: log4j2_client.properties
- mountPath: "/opt/seatunnel/config/log4j2.properties"
name: seatunnel-configs
subPath: log4j2.properties
volumes:
- name: seatunnel-configs
configMap:
name: seatunnel-cluster-configs

Deployment Strategy

Use multiple replicas (replicas = 2) to ensure high availability.
Apply a RollingUpdate strategy for zero-downtime deployments:

maxUnavailable: 25%: At least 75% of pods remain available during updates.

maxSurge: 50%: Allows temporary over-provisioning for a smoother update.

Label Selectors

Use Kubernetes’ recommended label conventions
spec.selector.matchLabels: Controls which Pods are managed by the Deployment.
spec.template.labels: Assigns metadata to the created Pods.

Node Affinity

Use the affinity property to control which nodes the Pods are scheduled on. Replace the placeholder values according to your actual cluster node labels.

Config File Mounts

Centralize configuration management using a ConfigMap
Mount individual configuration files using the subPath option

seatunnel-cluster-worker.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
name: seatunnel-cluster-worker
spec:
replicas: 3 # Modify based on your setup
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 50%
selector:
matchLabels:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-worker
app.kubernetes.io/component: worker
template:
metadata:
annotations:
prometheus.io/path: /hazelcast/rest/instance/metrics
prometheus.io/port: "5801"
prometheus.io/scrape: "true"
prometheus.io/role: "seatunnel-worker"
labels:
app.kubernetes.io/instance: seatunnel-cluster-app
app.kubernetes.io/version: 2.3.10
app.kubernetes.io/name: seatunnel-cluster-worker
app.kubernetes.io/component: worker
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodeAffinity-key
operator: Exists
containers:
- name: seatunnel-worker
image: seatunnel:2.3.10
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5801
name: hazelcast
command:
- /opt/seatunnel/bin/seatunnel-cluster.sh
- -r
- worker
resources:
requests:
cpu: "1"
memory: 10G
volumeMounts:
- mountPath: "/opt/seatunnel/config/hazelcast-master.yaml"
name: seatunnel-configs
subPath: hazelcast-master.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-worker.yaml"
name: seatunnel-configs
subPath: hazelcast-worker.yaml
- mountPath: "/opt/seatunnel/config/seatunnel.yaml"
name: seatunnel-configs
subPath: seatunnel.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-client.yaml"
name: seatunnel-configs
subPath: hazelcast-client.yaml
- mountPath: "/opt/seatunnel/config/log4j2_client.properties"
name: seatunnel-configs
subPath: log4j2_client.properties
- mountPath: "/opt/seatunnel/config/log4j2.properties"
name: seatunnel-configs
subPath: log4j2.properties
volumes:
- name: seatunnel-configs
configMap:
name: seatunnel-cluster-configs

Once the master and worker YAML files are ready, you can deploy them to your Kubernetes cluster using the following commands:

kubectl apply -f seatunnel-cluster-master.yaml
kubectl apply -f seatunnel-cluster-worker.yaml

If everything is set up correctly, you should see 2 master nodes and 3 worker nodes running in your SeaTunnel cluster:

$ kubectl get pods | grep seatunnel-cluster

seatunnel-cluster-master-6989898f66-6fjz8 1/1 Running 0 156m
seatunnel-cluster-master-6989898f66-hbtdn 1/1 Running 0 155m
seatunnel-cluster-worker-87fb469f7-5c96x 1/1 Running 0 156m
seatunnel-cluster-worker-87fb469f7-7kt2h 1/1 Running 0 155m
seatunnel-cluster-worker-87fb469f7-drm9r 1/1 Running 0 156m

At this point, we have successfully deployed a SeaTunnel cluster in “cluster-separated mode” on Kubernetes.

Now that the cluster is ready, how do we submit jobs from the client?

Submitting Jobs to the Cluster from the Client

Submit Jobs via Command-Line Tool

All SeaTunnel client configurations are defined in the hazelcast-client.yaml file.

First, download the binary package locally on the client machine (which includes the bin and config directories). Make sure that the SEATUNNEL_HOME path on the client is the same as that on the server. This is explicitly stated in the documentation: “Setting the SEATUNNEL_HOME the same as the server.”

If the paths don’t match, issues may arise, such as being unable to locate plugin paths on the server side due to mismatched directories.

Once inside the installation directory, simply edit the config/hazelcast-client.yaml file to point to the Headless Service we created earlier:

hazelcast-client:
cluster-name: seatunnel-cluster
properties:
hazelcast.logging.type: log4j2
connection-strategy:
connection-retry:
cluster-connect-timeout-millis: 3000
network:
cluster-members:
- seatunnel-cluster.bigdata.svc.cluster.local:5801

Once the client is configured, tasks can be submitted to the cluster for execution. There are two primary ways to configure JVM parameters during task submission:

Configure JVM parameters in the **config/jvm_client_options** file

This method applies the configured JVM parameters to all tasks submitted via seatunnel.sh, whether running in local or cluster mode. All submitted tasks will share the same JVM configuration.

Specify JVM parameters in the task submission command line

When using seatunnel.sh to submit tasks, you can directly specify JVM parameters in the command line. For example:sh bin/seatunnel.sh --config $SEATUNNEL_HOME/config/v2.batch.config.template -DJvmOption=-Xms2G -Xmx2G

This approach allows setting custom JVM parameters for each task submission individually.

Example: Submitting a Task to the Cluster

The following example demonstrates the complete process of submitting a task from the client to the cluster:

env {
parallelism=2
job.mode="STREAMING"
checkpoint.interval=2000
}

source{
FakeSource{
parallelism=2
plugin_output="fake"
row.num=16
schema={
fields{
name="string"
age="int"
}
}
}
}

sink{
Console{
}
}

Submit the task using the command:

sh bin/seatunnel.sh --config config/v2.streaming.example.template -m cluster -n st.example.template -DJvmOption="-Xms2G -Xmx2G"

On the Master node, use the following command to list the running jobs:

$ shbin/seatunnel.sh-l

JobID JobName JobStatusSubmitTime FinishedTime

---------------------------------------------------------------------------------------------
964354250769432580st.example.templateRUNNING 2025-04-1510:39:30.588

As shown above, the st.example.template task we just submitted is already in a RUNNING state. You can view the logs on the Worker node, which should look like this:

2025-04-15 10:34:41,998 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=1: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : bdaUB, 110348049
2025-04-15 10:34:41,998 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=1 rowIndex=1: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : mOifY, 1974539087
2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=2: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : jKFrR, 1828047742
2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=1 rowIndex=2: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : gDiqR, 1177544796
2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=3: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : bCVxc, 909343602
...

This confirms that the task has been successfully submitted to the SeaTunnel cluster and is running properly.

Submitting Jobs Using the Rest API

SeaTunnel provides a REST API interface for querying job status and statistics, as well as submitting and stopping jobs.

Previously, we configured a Headless Service for the Master node and exposed port 8080. This allows us to submit jobs from the client using the REST API.

The SeaTunnel REST API supports job submission via file upload. Example command:

$ curl 'http://seatunnel-cluster-master.bigdata.svc.cluster.local:8080/submit-job/upload' \
--form 'config_file=@"/opt/seatunnel/config/v2.streaming.example.template"' \
--form 'jobName=st.example.template'

{"jobId":"964553575034257409","jobName":"st.example.template"}

If the submission is successful, it returns a jobId and jobName as shown.

Next, retrieve the list of currently running jobs in the cluster via the REST API:

curl 'http://seatunnel-cluster-master.bigdata.svc.colo.gzgalocal:8080/running-jobs'
[{"jobId":"964553575034257409","jobName":"st.example.template","jobStatus":"RUNNING","envOptions":{"job.mode":"STREAMING","checkpoint.interval":"2000","parallelism":"2"}, ...]

As shown in the response, the job is running and additional metadata is also returned. This confirms that job submission via the REST API works as expected. For more details, refer to the official documentation: RESTful API V2

Summary

This article focuses on the recommended Separated Cluster Mode for deploying a SeaTunnel cluster on Kubernetes. The deployment process mainly includes the following steps:

Prepare the Kubernetes Environment

Ensure a fully functional Kubernetes cluster is set up and all required components are installed.

Build the SeaTunnel Docker Image

If no secondary development is needed, use the official image. Otherwise, compile and package locally, write a Dockerfile, and build your custom SeaTunnel image.

Configure Headless Service and Hazelcast Cluster

Hazelcast uses Kubernetes’ Headless Service for DNS-based discovery. First, create a Headless Service, then specify its DNS name in the Hazelcast YAML configuration using service-dns.

The Headless Service will resolve to the IP addresses of the associated pods, enabling inter-member discovery within the Hazelcast cluster.

Configure the SeaTunnel Engine

Modify the seatunnel.yaml file to set the SeaTunnel engine parameters.

Create Kubernetes YAML Deployment Files

Create separate YAML files for the Master and Worker, specifying node selectors, startup commands, resource requests/limits, and volume mounts. Deploy them to the Kubernetes cluster.

Configure the SeaTunnel Client

Install SeaTunnel on the client machine and ensure the installation path (SEATUNNEL_HOME) matches the cluster. Modify hazelcast-client.yaml to point to the Service address of the cluster.

Submit and Run Tasks

After completing the above steps, tasks can be submitted from the client and executed by the SeaTunnel cluster.

The configuration examples provided in this article are for reference only. There may be additional configuration options and best practices not covered here. Contributions and discussion are welcome. We hope this guide proves helpful!