Different ways to setup CDC

CDC is an important part of data processing. Using CDC you can achieve many goals from simple data replication to audit and complex ETL jobs. But implementing CDC is still a tough task (especially when considering not only happy-path). In this article I want to show different ways of implementing CDC. Also, I’ll try to not only show how to setup each variant, but also compare them with each other and highlight pros and cons of each option. ...

2025-12-31 · 10 min · Nikita Ryanov

PostgreSQL: Log-based CDC using debezium

In this little article I’ll show different ways to set up debezium for log-based CDC. Before diving into details about debezium, I’ll shortly describe CDC and why it may be helpful in some tasks. CDC: Change Data Capture In the Internet the CDC is described as a design pattern which allows to track data changes (deltas). Let’s consider this approach on table user_balances. Initial state of table is: ...

2024-05-10 · 10 min · Nikita Ryanov

Delivery and processing semantics: overview

In this article I want to make an overview of a delivery semantics in messaging systems, describe delivery guarantee and add my own thoughts about all of this. Delivery semantics: overview So, what exactly is delivery semantics and why this is important? Delivery semantics is about guarantees provided by messaging system or delivery protocol. These guarantees are about message order (delivery and processing), delivery reliability, duplication allowance and so on. In other words delivery semantic determines how exactly message will be handled in terms of delivery. ...

2024-05-01 · 13 min · Nikita Ryanov

Kafka-connect: overview

Kafka-connect: overview Imagine you have a task where you need to fetch some data from a database and incrementally store it in kafka or read the consumed data from kafka and store it in the database. You can solve both tasks using plain kafka consumer/producer API or even use kafka streams library, but if you don’t need comprehensive data transformations (e.g. enrichment, stream joining) then you can use Kafka connect for it. ...

2023-07-11 · 10 min · Nikita Ryanov

PostgreSQL: Log shipping Replication

Prerequisite All examples assume that postgresql is already installed on your machine. Also, all examples are created using PostgreSQL 14.1 on aarch64-apple-darwin20.6.0, compiled by Apple clang version 13.0.0 (clang-1300.0.29.3), 64-bit. Log shipping replication Log shipping replication (i will use a short name for it LSR) is another one method to physically replicate data between multiple database clusters. As name says this method is about to replicate data through WAL-files (segment) which is transferred between instances. This is probably the most simple and straightforward method for data replication, but this simplicity comes with price and compromises which also should be accounted. ...

2023-01-04 · 5 min · Nikita Ryanov

PostgreSQL: Streaming Replication

Prerequisite All examples assume that postgresql is already installed on your machine. Also, all examples are created using PostgreSQL 14.1 on aarch64-apple-darwin20.6.0, compiled by Apple clang version 13.0.0 (clang-1300.0.29.3), 64-bit. Streaming replication Streaming replication is a built-in mechanism in PostgreSQL to replicate data between multiple servers. It is a low-level replication mechanism as it streams WAL data from primary server to the replica through the physical replication slot, so it is highly recommended to replicate data between servers using similar PostgreSQL major version (minor versions could be different). Also, it is a good idea to have equal servers in terms of server configuration such as CPU, RAM and Disks, especially if you consider to promote replica to master if primary server goes down. ...

2022-02-10 · 7 min · Nikita Ryanov

PostgreSQL: Logical Replication

Prerequisite All examples assume that postgresql is already installed on your machine. Also, all examples are created using PostgreSQL 14.1 on aarch64-apple-darwin20.6.0, compiled by Apple clang version 13.0.0 (clang-1300.0.29.3), 64-bit. Logical replication Logical replication is another method to replicate data between multiple nodes. This replication uses publish-subscribe model. Each publisher may have multiple subscribers and each subscriber can subscribe to multiple publisher. Also, each subscriber may be a publisher for another node which make it possible to create a cascading replication. ...

2022-02-04 · 8 min · Nikita Ryanov

How to create a small json lib using antlr and shapeless

In this article i will show how antlr4 and shapeless can be used to create a small json library (not for production, of course ^_^) with ability to decode arbitrary json strings into case classes and encode them back with some scala magic. Project setup Let’s begin with a project setup. Generally speaking, it doesn’t really matter which IDE you will use, but i’ll use a Intellij Idea. Community edition is more than enough for it. Also, i recommend to instal antlr4 plugin for intellij – it’s not necessary, but it really helps to create and debug antlr grammar. ...

2021-05-08 · 16 min · Nikita Ryanov