How MySQL stores data in disk ?

vipul pachauri
3 min readJul 28, 2023

--

Hello everyone, in this article, I’ll be sharing my interview encounter, where I was asked some fundamental questions about Databases. I aim to provide a genuine account of my real-time interview experience.

Interviewer: Hey Vipul! do you have working experience with database ?

Me : Yes, I have experience with SQL and NOSQL both.

Interviewer : Great !! Have you ever thought about how databases like MySQL store table information on disk ?

Me: Yes ! Sure, In MySQL, data is stored in something called Data Pages. These are like small storage units, each holding 8 KB of data. When we put information into a MySQL table, it gets saved into these Data Pages. The cool thing is that MySQL arranges these pages in a special way, forming a tree-like structure called a B+ tree.

Interviewer: A tree-like structure, interesting! But what’s the purpose of arranging data like that?

Me: Organizing data this way helps the database to work more efficiently. Imagine we have an Employee table with a column called EmployeeId, which uniquely identifies each employee. Since it’s unique, MySQL automatically creates a clustered index on it. This means the data is physically sorted in the database based on EmployeeId. B+ tree has three level of nodes. Root Node,Intermediate Node and Leaf Node.

Interviewer: Okay great but, where is the data actually stored then in B+ tree ?

Me: The data is stored in the leaf nodes of the B+ tree, which are the bottom parts of the tree. Each leaf node contains information from the Employee table. Since each Data Page can hold a certain number of rows, depending on their size.

Interviewer: Okay, got it! You mentioned the root node and intermediate nodes. What are those?

Me: The root node is the top of the tree, and the nodes in between the root and leaf nodes are called intermediate nodes. They all together form the B+ tree. The root and intermediate nodes contain index information, and the leaf nodes contain the actual data rows.

Interviewer: How does this help with finding specific information quickly?

Me: Let’s say we want to find an employee with EmployeeId = 1120. The database engine starts from the root node, then goes through the intermediate nodes until it reaches the right leaf node. Because the data rows are sorted by EmployeeId, the engine can quickly find the employee with Id = 1120. This way, even with thousands or millions of records, the database can find the data we want quickly, as long as there’s an index to help with the search.

Interviewer: That sounds really efficient! But what happens if we want to search using another column, like the employee’s name?

Me: If we search using a column without an index, like the employee’s name, it becomes inefficient. The database has to read through every record in the table, which takes more time. That’s where non-clustered indexes come in handy. They allow quick searches based on other columns, and use functionality like Index Seek and Index Scan.

MySQL stores each database (also called a schema) as a subdirectory of its data directory in the underlying filesystem. When you create a table, MySQL stores the table definition in a .frm file with the same name as the table. Thus, when you create a table named MyTable, MySQL stores the table definition in MyTable.frm.

Thanks to Arpit Bhayani . I would highly recommend to watch the video. Thanks for reading. Happy Learning !!

--

--

vipul pachauri
vipul pachauri

Written by vipul pachauri

Senior Software Backend Engineer

Responses (1)