How Data Travels and Syncs in Database Replication

vipul pachauri
5 min readAug 16, 2023

Hello again 🖐, everyone! Remember, I mentioned in my last post that we’d dive into Database replication formats? Well, today’s the day, But before we jump into that, let’s first talk about the various phases of database replication. Replication has multiple angles, and sometimes we’re not sure if they’re methods or just different approaches. There’s synchronous and asynchronous replication, one-way and two-way replication, statement-based replication, row based replication (streaming replication/logical replication) and the task of managing replication — it can get a bit confusing. What do all these terms mean?

Let’s not stress too much about organizing everything perfectly. Instead, let’s piece things together and understand the bigger picture. It’s okay if things aren’t neatly categorized — that’s just how life works sometimes, and that’s perfectly fine (Gyaan 😉).

Why Replication ?

Imagine you’re looking for your favourite book, but you’re searching in every corner of the entire house. That’s not smart, right? Well, some people do the same with database too. They ask for a tiny piece of information by searching through a huge amount of data. So, even if you add more power (like more RAM), (Vertical Scaling) it won’t help much. It’s like trying to fit a huge elephant in a small room!

Now, let’s say you’re smart and ask for things in a better way. But still, if a lot of people are asking for things at the same time (High Throughput), the database might get tired. So, what do we do? We use a trick called “separation.” We make one part of the database just for writing new stuff, and another part for reading. This way, they don’t get in each other’s way. That is Read-Write Segregation OR Command Query Responsibility Segregation (CQRS).

What is Replication ?

In simple words, replication is like making a backup copy of your database’s reading part so that when it’s too busy, people can read from the copy. It’s like having a copy of the reading part of the database in another place. So, when you write new stuff in the main database, you also copy it to the replica. This copying process is called Replication. Please go through my previous blog to deep dive into replication.

Now, the point comes, how we make this copy, there can be done in different ways, like telling every step or sharing just the important parts of data. It’s like finding the best way to share your cool story with a friend. , right? Let’s understand this with story. (Kyun ki kahani hamesha yaad reh jaati hai 😁)

How do we replicate or copy data ?

As we know ,Replication is like sharing a story between two friends. Let’s say there are lot of events in story and the events in the story are written down and read by the first friend, and then the second friend listens to those events and acts them out. Now, The story events can be written in different ways (based on what’s happening: use cases).

  1. When the story is written as sentences, the first friend writes down the sentences and the second friend reads them aloud.It is like telling a story step by step. This is called sentence-based sharing or Statement based replication (SBR) — it’s like speaking the sentences as they were written.
  1. When the story is about changes to things, the first friend writes down how each thing changes. Then, the second friend copies those changes and acts them out. It is like telling your friend only the important parts of the story. This is called change-based sharing or Row based replication (RBR) — it’s like showing how things change.

Choosing Between Statement-Based and Row-Based Approaches

Okay, so why are we bothering to send the actual statement (SBR) all the way to the database, right? There’s a downside to that — you’ve already spent time making the database run that statement, and now the replica is doing the same thing. It’s like making the database do double the work. Re-doing the same task that we just did is costly, and making the replica do it again is just not smart. We definitely want to avoid that.

So, Instead of sending the actual statement to the other side, we send the result of that statement, This is known as row based replication (RBR), also referred to as streaming replication. It’s like we’re syncing the result. The cool thing is, as things happen in our primary database — like more updates — it’s immediately sent to the replica. We’re streaming these changes in real-time, so as soon as something is done, we see it in the replica (Pros : Good for Low Latency). Plus, when you commit something in your primary database, you don’t need to wait or transfer all the changes. But but but… there’s a catch — the issue lies in the bandwidth in. We end up sending more stuff through the network, and that can be a problem. There’s a trade-off in everything.

And that’s a wrap! I trust you found the blog engaging and gained insights from it. Stay connected for my upcoming posts. Keep up the enthusiasm for learning and keep pushing forward. Looking forward to meeting you in the next blog. Happy Learning !! Yes Learning should be beautiful and happy 😊

--

--