Tech/Engineering

How to Optimize Virtual Machine Backups by Excluding Transient Files

Uday Swami, Principal Engineer - Enterprise Workload

Virtualized environments offer numerous benefits, but managing storage costs can be a challenge. Traditional backups often include large, unnecessary files like pagefile.sys and hiberfil.sys. These files are temporary and only serve to maintain application states. Backing them up not only wastes storage space but also increases the Total Cost of Ownership (TCO).

This blog post explores a solution to optimize virtual machine (VM) backups and storage management by excluding these transient files.

Identifying and Excluding Unnecessary Files

  1. Target Files: We focus on pagefile.sys and hiberfil.sys due to their large size and temporary nature.

  2. Skipping Unnecessary Backups: These files are irrelevant for VM consistency after a restore. Excluding them reduces storage requirements significantly.

Locating File Offsets for Zero-Block Optimisation

Challenge: To exclude specific file ranges during backup, we need their precise offsets within the virtual disk (VMDK, Raw Disks).

Solution: We leverage the Master File Table (MFT) to locate these files.

MFT Query: Querying the MFT for Local Cluster Number (LCN) and Virtual Cluster Number (VCN) of these files, we can calculate their exact offsets.

Zero-Block Optimisation & Deduplication: Knowing the offsets allows us to zero out those specific ranges in the backup data. This technique, combined with deduplication, significantly reduces storage requirements.

Benefits of this approach:

  • Reduced Storage Footprint: VM backup sizes can be reduced by roughly 2.5x the size of the VM's RAM (depending on configuration).

  • Performance Efficiency: By targeting specific file ranges, we avoid full filesystem scans, minimizing performance impact.

Limitations:

  • NTFS Support Only: This approach currently works only with NTFS file systems. However, since pagefile.sys and hiberfil.sys typically reside on the system drive, which is usually NTFS by default, this limitation rarely comes into play.

Finding Offsets with NTFS Commands:

The NTFS filesystem driver offers command-line tools to identify file cluster numbers within an NTFS partition:

  • ntfscluster: Retrieves the cluster number of a specific file.

  • ntfsinfo: Provides details about the MFT, including cluster size.

Calculating Disk Offsets:

To determine the offset within the disk file, we need the starting offset of the block device associated with the disks. We can achieve this using the partx command on the loop device, which returns the sector size and starting sector on the disk.

Combining Information for Precise Offsets:

  • Page_file offset = (LCN x Cluster Size) + (Start Sector x Sector Size)

  • The sector size can be obtained using the fdisk -l command.

By implementing this approach, you can significantly optimize VM storage management and usage, as well as reduce your TCO. The technique leverages readily available tools and focuses on commonly used file types, making it a practical solution for most virtualized environments.

Druva’s solution for securing data stored in VMs

With Druva, you can protect vSphere and VMware Cloud through a single console to eliminate backup infrastructure dependencies and overhead. As explained above, Druva backs up only essential files and our global deduplication automatically tiers unique blocks to cold storage, rather than copying duplicate data sets saving tons in storage costs. To learn more about Druva’s VMware solution, check out Cloud Backup and DR for vSphere and VMware Cloud.