Looking for a simpler solution? If you're reading this tutorial, you want to reliably stream changes out of your Postgres database. You’ve turned to Debezium, only to discover that the official Debezium documentation leaves more to be desired. I hit the same snag, and decided to save future developers similar toil by writting this guide. If you are looking for an easier, faster alternative to Debezium, we built Sequin to provide the same guarantees of Debezium while eliminating all this complexity.
Introduction
In this tutorial, you'll set up a complete CDC pipeline using Debezium (version 3.1) with Kafka Connect to capture changes from a PostgreSQL database. By the end, you'll have a working system that streams every insert, update, and delete operation from your database into Apache Kafka topics.
What you'll learn:
- How to set up a local development environment for Debezium with Docker Compose
- How to configure PostgreSQL for logical replication
- How to set up a Debezium connector to capture changes
- How to observe and work with CDC events from your database
Let's get started!
Prerequisites
Before diving in, make sure you have:
- Docker Engine and Docker Compose v2 installed on your system
- curl command-line tool for making HTTP requests
- Basic familiarity with PostgreSQL and command-line operations
You'll be doing everything in docker and your CLI - Debezium doesn't come with a UI or additional tooling.
The Architecture
Here's what you're setting up:
- A PostgreSQL database with logical replication enabled
- Apache Kafka (which requires ZooKeeper in this version) for message streaming
- Kafka Connect with the Debezium PostgreSQL connector to provide CDC on the Postgres database and stream those changes to the aformentioned Kafka topic
- A simple "customers" table that you'll monitor for changes
When you're done, any change to the customers table will be captured by Debezium and sent as an event to a Kafka topic, which you can then consume from any application.
Step 1: Setting Up the Infrastructure with Docker Compose
First, create your environment using Docker Compose. This will spin up all the necessary services in isolated containers.
Create a docker-compose.yml file
Create a new directory and then create a new file called docker-compose.yml:
mkdir debezium-example
cd debezium-example
touch docker-compose.yml
Copy and paste the following into the docker-compose.yml
:
services:
zookeeper:
image: quay.io/debezium/zookeeper:3.1
ports: ["2181:2181"]
kafka:
image: quay.io/debezium/kafka:3.1
depends_on: [zookeeper]
ports: ["29092:29092"]
environment:
ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LISTENERS: INTERNAL://0.0.0.0:9092,EXTERNAL://0.0.0.0:29092
KAFKA_ADVERTISED_LISTENERS: INTERNAL://kafka:9092,EXTERNAL://localhost:29092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
connect:
image: quay.io/debezium/connect:3.1
depends_on: [kafka]
ports: ["8083:8083"]
environment:
BOOTSTRAP_SERVERS: kafka:9092
GROUP_ID: 1
CONFIG_STORAGE_TOPIC: connect_configs
OFFSET_STORAGE_TOPIC: connect_offsets
STATUS_STORAGE_TOPIC: connect_statuses
KEY_CONVERTER_SCHEMAS_ENABLE: "false"
VALUE_CONVERTER_SCHEMAS_ENABLE: "false"
postgres:
image: debezium/postgres:15
ports: ["5432:5432"]
command: postgres -c wal_level=logical -c max_wal_senders=10 -c max_replication_slots=10
environment:
POSTGRES_USER: dbz
POSTGRES_PASSWORD: dbz
POSTGRES_DB: example
Understanding the docker-compose.yml
Briefly, this file defines the following services - all of which are needed to run Debezium:
- ZooKeeper: Provides distributed configuration and synchronization for Kafka
- Kafka: The message broker that will store your change data events.
- Kafka Connect: The framework that runs your Debezium connector
- PostgreSQL: Your database with logical replication enabled
Notice the PostgreSQL configuration:
- You're using the
debezium/postgres:15
image, which comes with the necessary logical decoding plugins - You set
wal_level=logical
to enable logical replication - You configure
max_wal_senders
andmax_replication_slots
to allow multiple replication connections
Start the containers
Back in your terminal, ensure you are in the debezium-example
directory containing your docker-compose.yml
file, and run:
docker compose up -d
This command starts all the services in detached mode. To verify that all containers are running:
docker compose ps
You should see all four containers (zookeeper, kafka, connect, and postgres) with a status of "Up".
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c218e8a9fc67 quay.io/debezium/connect:3.1 "/docker-entrypoint.…" 59 minutes ago Up 59 minutes 8778/tcp, 0.0.0.0:8083->8083/tcp, 9092/tcp debezium-connect-1
5ffe2fc31745 quay.io/debezium/kafka:3.1 "/docker-entrypoint.…" 59 minutes ago Up 59 minutes 9092/tcp, 0.0.0.0:29092->29092/tcp debezium-kafka-1
1b1de4194458 debezium/postgres:15 "docker-entrypoint.s…" 59 minutes ago Up 59 minutes 0.0.0.0:5432->5432/tcp debezium-postgres-1
d5d87a35b013 quay.io/debezium/zookeeper:3.1 "/docker-entrypoint.…" 59 minutes ago Up 59 minutes 2888/tcp, 0.0.0.0:2181->2181/tcp, 3888/tcp debezium-zookeeper-1
Step 2: Preparing PostgreSQL for CDC
Now that the required infrastructure is running, you need to configure PostgreSQL for change data capture. You’ll connect to the Postgres instance you just created and add a new replication user for Debezium, set up a demo table, and configure the table for full row captures.
Create a replication user
Create the demo table
Create a simple customers
table that we'll monitor for changes:
docker compose exec postgres \
psql -U dbz -d example \
-c "CREATE TABLE customers (id SERIAL PRIMARY KEY, name VARCHAR(255), email VARCHAR(255));"
This table has three columns:
-
id
: An auto-incrementing primary key -
name
: A customer's name -
email
: A customer's email address
If you'd like to explore the database more directly in a SQL client like TablePlus, you can connect to it using any PostgreSQL client with these connection details:
- Host: localhost
- Port: 5432
- Database: example
- User: dbz
- Password: dbz
- SSL Mode: disable
Or using a connection string:
postgresql://dbz:dbz@localhost:5432/example?sslmode=disable
This allows you to explore the database schema, run queries, and make changes that will be captured by Debezium.
Configure full row images
By default, PostgreSQL's logical replication only includes the primary key and changed columns in update events. To get the full "before" and "after" state of rows, you need to set the REPLICA IDENTITY
to FULL
:
docker compose exec postgres \
psql -U dbz -d example \
-c "ALTER TABLE customers REPLICA IDENTITY FULL;"
This setting ensures that when a row is updated or deleted, the entire row's data (before the change) is included in the WAL (Write-Ahead Log) entry, giving you complete information about the change in the resulting Kafka message produced by Debezium.
Step 3: Setting Up the Debezium Connector
Now, configure and register the Debezium PostgreSQL connector with Kafka Connect.
Create the connector configuration
Create a file named register-postgres.json
with the following content:
{
"name": "example-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "dbz",
"database.password": "dbz",
"database.dbname": "example",
"topic.prefix": "example",
"slot.name": "example_slot",
"publication.autocreate.mode": "filtered",
"table.include.list": "public.customers"
}
}
This configuration:
- Names the connector
example-connector
- Specifies the PostgreSQL connection details
- Sets a topic prefix of
example
(resulting in a topic namedexample.public.customers
) - Creates a replication slot named
example_slot
- Configures the publication to only include the tables you specify
- Limits change capture to only the
public.customers
table
Register the connector
Register the connector with Kafka Connect's REST API running on port 8083
:
curl -X POST -H "Content-Type: application/json" \
--data @register-postgres.json \
http://localhost:8083/connectors
If successful, you should receive a JSON object that looks similar to the configuration in your register-postgres.json
file. This means that Kafka Connect has started the Debezium connector, which is now listening for changes to our customers
table.
Verify the connector is running
You can check the status of your connector:
curl -s http://localhost:8083/connectors/example-connector/status | jq
{
"name": "example-connector",
"connector": {
"state": "RUNNING",
"worker_id": "172.20.0.5:8083"
},
"tasks": [
{
"id": 0,
"state": "RUNNING",
"worker_id": "172.20.0.5:8083"
}
],
"type": "source"
}
You should see that the connector is in the RUNNING
state. If you encounter any issues, you can check the Kafka Connect logs:
docker compose logs -f connect
Step 4: Generating and Observing Change Events
Now for the exciting part! Make some changes to your database and watch the CDC events flow into Kafka.
Start a Kafka consumer
First, open a terminal window and start a console consumer to watch the Kafka topic:
docker compose exec kafka /kafka/bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic example.public.customers --from-beginning
This command connects to Kafka and subscribes to the example.public.customers
topic, displaying all messages as they arrive.
Insert a new customer
In a new terminal window, insert a record into our customers table:
docker compose exec postgres \
psql -U dbz -d example \
-c "INSERT INTO customers(name,email) VALUES ('Alice','[email protected]');"
In your consumer terminal, you should see a JSON message appear that looks something like this:
{
"before": null,
"after": {
"id": 1,
"name": "Alice",
"email": "[email protected]"
},
"source": {
"version": "3.1.0.Final",
"connector": "postgresql",
"name": "example",
"ts_ms": 1620000000000,
"snapshot": "false",
"db": "example",
"sequence": "[\"12345678\",\"12345678\"]",
"schema": "public",
"table": "customers",
"txId": 123,
"lsn": 12345678
},
"op": "c",
"ts_ms": 1620000000001
}
This is the Debezium change event format, also known as the "envelope." It contains:
-
before
: The previous state of the row (null for inserts) -
after
: The new state of the row -
source
: Metadata about the event source -
op
: The operation type ("c" for create/insert) -
ts_ms
: Timestamp of when the event was processed
Update the customer
Now let's update the customer record:
docker compose exec postgres \
psql -U dbz -d example \
-c "UPDATE customers SET name='Alice Updated' WHERE id=1;"
In your consumer terminal, you should see another JSON message, this time with both before
and after
fields populated:
{
"before": {
"id": 1,
"name": "Alice",
"email": "[email protected]"
},
"after": {
"id": 1,
"name": "Alice Updated",
"email": "[email protected]"
},
"source": { /* ... */ },
"op": "u",
"ts_ms": 1620000000002
}
Note that op
now has a value of "u" for update, and both before
and after
states are included.
Delete the customer
Finally, let's delete the customer:
docker compose exec postgres \
psql -U dbz -d example \
-c "DELETE FROM customers WHERE id=1;"
You'll see a final message with op
set to "d" (delete):
{
"before": {
"id": 1,
"name": "Alice Updated",
"email": "[email protected]"
},
"after": null,
"source": { /* ... */ },
"op": "d",
"ts_ms": 1620000000003
}
Notice that for deletes, after
is null since the row no longer exists after the operation.
Step 6: Cleaning Up
When you're finished experimenting, clean up the environment:
docker compose down -v
This stops all containers and removes volumes created by Docker Compose.
Understanding the Architecture
Now that you've seen Debezium in action, it’s helpful to discuss what is happening behind the scenes.
- PostgreSQL Write-Ahead Log (WAL): When data changes in PostgreSQL, those changes are written to the WAL for durability. With logical replication enabled, these changes can be decoded into logical change events (we go into more detail in this post).
- Debezium PostgreSQL Connector: Debezium creates a replication slot in PostgreSQL and subscribes to changes. It reads from the WAL, transforms the binary changes into structured events, and sends them to Kafka.
- Kafka Connect: This framework manages the connector lifecycle and ensures reliable delivery of events to Kafka, handling failures and offsets.
- Kafka Topics: Each table's changes are published to a dedicated topic, allowing consumers to subscribe only to the tables they care about.
Conclusion
You've successfully set up a change data capture pipeline using Debezium and Kafka Connect. You've seen how to capture inserts, updates, and deletes from PostgreSQL and stream them to Kafka in real-time.
In this Kafka Connect runtime, Debezium relies on Kafka (and the Kafka Exosystem) to provide capabilities like single message transforms, message retention, and efficient scaling. Now that the system is running, get familiar with these tool to get set up.
From here, you'll want to tailor this template to your Postgres database. Ensure you enable logical replication in your Postgres instance, create a user for Debezium, and then buckle up for Java logs.