Introduction
The ability to restore individual files is one of the most basic functionalities of a data backup and protection tool. As easy as it may seem, performing this function on a file located in a VM is a resource-intensive and time-consuming process. Even for restoring a small file, users need to restore the contents of the entire VM and then handpick the file from the data set.
To get around this problem, we planned to use File-Level Restore or simply FLR. Part 1 of this blog series explains what is FLR, and how we went about implementing FLR. The second part of this blog will discuss the improvements we made in the FLR restore process to achieve faster execution speeds.
What is FLR?
FLR stands for File-Level Restore. It refers to the ability to restore individual files or folders from a virtual disk file that is stored in the cloud. This eliminates the need of downloading the entire virtual disk and attaching it to a virtual machine. FLR provides a more efficient and granular way to recover specific files or folders, saving time and resources compared to traditional full-disk restores.
Significance of FLR
File-Level Restore (FLR) plays a vital role in providing customers enhanced control over the data restoration process. Without FLR, even for a file as small as 16 MB, users would have to restore the entire virtual machine, leading to the potential download of multiple terabytes of data from the cloud. FLR offers a streamlined and efficient solution for selectively restoring specific files from a virtual disk backup.
The high adoption rate of FLR (accounting for approximately 50% of all restores) underscores the importance of this feature and its frequent utilization by customers.
Before diving deep into the solution, here’s the tech stack that we used and a few key terms that we will use frequently to explain the details of the solution.
Technology stack that we used
Language: Python
FUSE: User module implemented with Python
Loop Devices
Terminology
FLR: File-Level Restore.
Virtual disk: A virtual disk is a file that appears as a physical disk drive to the guest operating system. Virtual hard disk files store information such as the operating system, program files, and data files.
Disk Offset: An offset into a disk is simply the character location within that disk, usually starting with 0; thus "offset 240" is the 241st byte in the disk.
File Offset: An offset into a file is simply the character location within that file, usually starting with 0; The important thing to note is that a File Offset is converted to a Disk Offset before the data can be read and this conversion is done by the underlying FileSystem.
Target/Target VM: Used interchangeably to refer to a target virtual machine where data has to be restored.
Original Solution
To access a file stored within a virtual disk, it is necessary to determine the disk offsets of the data blocks corresponding to that file. In the best-case scenario, the blocks of the file are stored sequentially. While in the worst-case recovery scenario, the data blocks may be scattered across the entire virtual disk.