Introduction
Apache Kafka is an event streaming platform used to collect, process, store, and integrate data at scale. It has numerous use cases including distributed streaming, stream processing, data integration, and pub/sub messaging. Data streaming involves continuous flow of high volumes of data from different sources for processing and analyzing. An event is any type of action, incident, or change that's identified or recorded by software or applications.
Kafka consists of these key components:
- Producer: An application that write data(events) to Kafka topics. A producer can send data to any broker in the Kafka cluster.
- Consumer: An application that reads data from Kafka topics.
- Brokers: Kafka servers that store and replicate messages.
- Topic: Streams of records that Kafka recognizes data into.
- Zookeeper: A distributed coordination service that manages metadata, leader election, and other critical tasks in a Kafka cluster.
- Clusters: Group of servers working together to enhance durability, low latency and scalability.
- Partitions: Division of topics for scalability and parallelism.
- Connect: it manages the tasks.
Installation
Kafka works well on Linux operating system. If you are on windows, you can download Windows Sub-Linux(WSL). To install Kafka, make sure you have Java(Version 11 or 17) installed on your system.
Download Kafka from official website , unzip it using the following command on a terminal:
wget https://archive.apache.org/dist/kafka/3.6.0/kafka_2.12-3.6.0.tgz
tar -xzf kafka_2.12-3.6.0.tgz
mv kafka_2.12-3.6.0 kafka
Start Kafka environment
Kafka traditionally requires Zookeeper for coordination. Start Zookeeper by running the following command on inside the directory that you have installed Kafka:
kafka/bin/zookeeper-server-start.sh kafka/config/zookeeper.properties
Once the zookeeper is running, open another terminal window and run Kafka broker service:
kafka/bin/kafka-server-start.sh kafka/config/server.properties
Kafka environment is running successfully and ready to be used.
Topics in Kafka
Topics are streams of records that Kafka recognizes data into. Producers publish messages to topics, and consumers subscribe to them.
In Kafka, before you write an event, you will to create a topic using the following command:
kafka/bin/kafka-topics.sh --create --topic --victor-topic --bootstrap-server 127.0.0.1:9092
By default Kafka runs on port 9092 and localhost 127.0.0.1
.
To list down all the topics available, run the command:
kafka/bin/kafka-topics.sh --list --bootstrap-server 127.0.0.1:9092
Kafka Events
A Kafka client communicates with the Kafka brokers via the network for writing (or reading) events.
Once the brokers receive the events, they will store them in the specified topic for as long as you need.
Run the console producer client to write events into your topic:
kafka/bin/kafka-console-producer.sh --topic victor-topic --bootstrap-server 127.0.0.1:9092
My first event in victor-topic
Run the console consumer client to read the events you just created:
kafka/bin/kafka-console-consumer.sh --topic victor-topic --from-beginning --bootstrap-server 127.0.0.1:9092
My first event in victor-topic
To stop the consumer client, press ctrl + C