Database Partitioning ? What is it ? Why is it ? [ Interview Questions ]
Hello friends !! Hope you are doing well. This blog is all about Database Partitioning. I have gone through lot of blogs and videos to understand what is database partitioning actually. Here, I will share my learning with you. Let’s start.
Let’s suppose we have a paper, now I am slicing it in the middle in a horizontal manner and take that piece and throw it in a table and the other piece in another table, this is horizontally partitioning. Now, I take another paper and this time I am slicing it vertically and take columns along with all its rows and pull them in a table and you put it in another table, this is vertical partitioning.
Why Partitioning?
As your data grows, it can become a big mess to manage. Partitioning helps you stay organized and makes things faster to find. Imagine you have a huge library. Instead of searching the entire library for a book, you can go straight to the right section. That’s what partitioning does for databases — it helps you find things quickly.
Horizontal Partitioning
Let’s take the same library example. Imagine you have a long list of books, and you want to split it into groups based on the author of each book. You make separate lists for author names starting with ‘A,’ ‘B,’ ‘C,’ and so on. Each group goes into its own box or table. That’s like Horizontal Partitioning, where you split data into different boxes based on a range, like letters of the alphabet.
Vertical Partitioning
Let’s take the same library example. Think of a table with lots of information, like author’s name, type of book, and book name. Now, imagine you want to keep author names and book type in one table and book name in another table. You’re dividing the information by columns, like putting similar things in their own boxes. This is Vertical Partitioning — separating data by categories.
How do applications work with these partitioned tables?
Think of it like talking to a helpful librarian. When you ask for a book, you just tell the librarian what you’re looking for. You don’t need to know which shelf the book is on. Similarly, when an application wants data from a partitioned table, it just asks the database without worrying about where the data is stored. The database system takes care of finding it for you.
Okay ,Guys, now we have an idea about the what is partitioning, Why partitioning, and type of partitioning. But, believe me this is not enough. We have to drill down deeper. I will share real time interview question that I faced during interviews.
Interview Questions
Interviewer : Why do we need Sharding if we have horizontal portioning ?
Me : There is a small difference, difference of use case. Horizontal Partitioning divides a table’s rows within a single database instance.Here, data remains on the same server. Where as, Sharding is Distributed Horizontal Partitioning. It divides the entire dataset across multiple separate databases. Used for scalability and handling large datasets or high traffic. Each shard is independent and can be on different servers. He was happy with my answer. So he added one more question.
Interviewer : Which database would you choose MySQL or any NoSQL database (ACID compliance is not priority) for Sharding ?
Me : Both MongoDB and MySQL offer partitioning features, MongoDB’s sharding is more oriented toward horizontal scalability and automatic distribution of data across multiple servers, whereas MySQL’s partitioning is focused on optimizing performance and maintenance within a single database instance. The choice between the two depends on your specific scalability and performance requirements. After this answer we got into discussion on Partition key,HotSpot problem and consistent hashing.
In summary, my understanding is that concepts such as Partitioning (Data and Database), Sharding (Horizontal Scaling), Database Selection (CAP Theorem), consistent hashing, and Partitioning by Key or Range (addressing Hotspot issues) are interconnected. To respond effectively to questions related to these topics, it’s crucial to grasp the specific use case and engage in a conversation with the interviewer. Personally, I believe that during system design discussions, it’s wise to begin with a straightforward approach (single server) and gradually progress to distributed server scenarios. This approach enhances your responses.
Thank you for taking the time to read the blog. I trust you found it engaging, and please don’t hesitate to request blogs on various subjects — I’ll make an effort to accommodate your requests. Enjoy your learning journey!
Email : vipulpachauri12@gmail.com
LinkedIn : https://www.linkedin.com/in/vipul-pachauri/