Change Data Capture (CDC) and Postgres
Recently, I've written this post here, where I show you guys how to setup Change Data Capture (CDC) in Postgres. So, if you don't know how to setup CDC in Postgres, I totally recommend you guys to take a quick look at that tutorial first.
GOCDC
GoCdc it's an Opensource API for data streaming, developed (I mean, in development) in Golang that I would like to share with you guys. In short, the concept behind it is similar to Debezium, but in Golang, which in my opinion, it's easier for anyone to hack and adapt the application according to your needs. The focus of GoCdc is on simplicity. Simplicity to setup, to modify, etc.
Hands-on π οΈ
Required:
Docker -> https://www.docker.com/get-started
1 - Creating the Project
First of all, create a docker-compose.yml file in the directory of your preference.
version: "3"
services:
gocdc:
image: "133thiago/gocdc:latest"
ports:
- "8000:8000"
db:
image: "postgres:11"
container_name: "my_postgres"
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
- POSTGRES_DB=example_db
ports:
- "5432:5432"
command:
- "postgres"
- "-c"
- "wal_level=logical"
volumes:
- my_dbdata:/var/lib/postgresql/data
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
image: wurstmeister/kafka
ports:
- "9092:9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: localhost
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
volumes:
- /var/run/docker.sock:/var/run/docker.sock
volumes:
my_dbdata:
2 - Running in Docker
Now you're going to run the docker-compose up -d, and then run docker ps. Then you should see the containers created.
3 - Postgres Setup
As I said at the beginning of this post, this step is going to be this tutorial in here -> (How to use Change Data Capture (CDC) with Postgres)[https://dev.to/thiagosilvaf/how-to-use-change-database-capture-cdc-in-postgres-37b8].
IMPORTANT: Just make sure you create either create your database as example_db, as it is in the value of POSTGRES_DB in our docker-compose.yml, or you change the value in our docker-compose.yml to the database that you created. It is up to you.
4 - Kafka config:
Now, it's time to create our Kafka Topic, which is going to be the topic that our connector is going to send the database changes.
First, run docker ps again and get the CONTAINER ID of your Kafka container, it will be something like this 4bed6164a8e9
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS
4bed6164a8e9 wurstmeister/kafka "start-kafka.sh" 5 hours ago Up About an hour 0.0.0.0:9092->9092/tcp
Then, run the following command:
docker exec -it 4bed6164a8e9 "bash" # Remember to replace the Container ID with yours
Finally, let's create the topic, I'm going to call it test, but it is up to you. just bear in mind, you will need the topic later!
bash-4.4# Kafka-topics.sh --create --zookeeper zookeeper:2181 --replication-factor 1 --partitions 1 --topic test
5 - Tinder match: Postgres β€οΈ Kafka
GOCDC provides a REST Interface for creating the connection between the Database and Kafka. If you look again to our docker-compose.yml file, GOCDC is using the Port 8000. So let's do a POST to our localhost:8000/connectors/Postgres
curl --location --request POST 'http://localhost:8000/connectors/postgres' \
--header 'Content-Type: application/JSON' \
--data-raw '{
"connector_name":"Conn PG Test",
"db_host": "localhost",
"db_port": 5432,
"db_user": "postgres",
"db_pass": "postgres",
"db_name": "example_db",
"db_slot": "slot",
"kafka_brokers": [ "localhost:9092" ],
"kafka_topic": "test"
"lookup_interval": 5000
}
'
Let's dive into the JSON object sent in this request:
connector_name: No big deal here, just an Identifier. It is unique though, so you can use it to Edit via PUT request.
db_host: The IP where your Postgres is running. In our example, it is localhost.
db_port: The Port to access our Postgres database.
db_user AND db_pass: User and Password of our Postgres database.
db_name: The name of the database
db_slot: The name of the Replication Slot.
kafka_brokers: An array with our Kafka brokers. Well, we're running only one in our example, but you can set multiple.
kafka_topic: Here is the Topic that we created in Step 4.
lookup_interval: Here you tell the connector how often it should execute a CDC lookup. In our example, every 5 seconds (yes, the parameter value is in milliseconds).
6 - Kafka Consumer:
So, we now have our Postgres database up and running, our Kafka "cluster" also up and running and the Connector created!
Assuming that you have the Database and a Table created in your Postgres (Step 3), Connect to your Kafka once again (Step 4) and run the following command:
bash-4.4# Kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test
And now, you will be able to see every Insert, Update and Delete from your Database, being sent to your Kafka consumer.
I hope you enjoyed the Tutorial, if you've got stuck at some step, please leave a comment and do my best to help! π
Top comments (1)
Thanks for the nice article -- I think it might be nice to mention that you are the author of GOCDC :)