For many years, customers have purchased and maintained backup servers, backup software, tape libraries, deduplication appliances, hyper-converged backup appliances, and VM versions of all of the above. So why did Druva choose to reject these traditional designs and instead offer its product only as-a-service? This is the first blog post in a series that should answer that question. It will start by discussing the challenges of purchasing and maintaining the servers and storage required for a traditional backup system.
Sizing a backup system
Assuming your data center is of any significant size, you’re going to need to calculate and design the size of your backup system. To do this properly, you need to start with how large a full backup is and how often it will be run. You will also need to estimate the size of each incremental backup. These calculations will be used to determine the required throughput of the backup servers and storage that will perform the job.
Consider, for example, a 700TB data center that wants to do a weekly full backup – a pretty common design. If they choose to reduce the load on the network during the week by only performing full backups on the weekend, they will need a backup system capable of backing up 233TB per night (i.e. Friday, Saturday, and Sunday nights). If they spread the full backups out across the week, however, they will only need to backup 100TB each night. Evenly spreading full backups out across the week will create additional complexity in the backup system, but it will significantly reduce the required size of the backup server.
In addition to the weekly full backup, a nightly incremental backup also needs to be performed. Some research will need to be done to determine how big the incremental backup will be, but a typical traditional backup system incremental might be 10% to 20% of the size of a full. In the scenario described above, that is an additional 10 to 20TB per night.
The numbers above gives us a range of 110TB – 253TB per night, depending on how they decide to do their full backups and how large the incremental backups will actually be. Then some knowledge of the requisite hardware will be needed in order to spec out a system capable of backing up either 12TB or 30TB per hour.
Properly sizing a large backup system requires a lot of experience and knowledge of how each part of the system works. You need one or more backup servers capable of moving that amount of data, a backup database whose performance will not slow down the backup while recording the record of the backup, and storage that can store the data quickly enough. You also need to design and purchase whatever system will get the backups offsite, which can include replication or a tape system. Those parts of the infrastructure will also need to be properly designed as well.
Suffice it to say that I built an entire career off of properly designing the physical hardware behind backup systems. It is complicated and full of unknowns. It’s very easy to either underpower the system and have backups not meet the window, or overpower the system to avoid missing the backup window – only to waste money on the excess compute and I/O capacity.
Purchasing the hardware and software
After sizing the backup system, the customer must purchase each piece from the appropriate vendor. This is typically done via large capital purchases that take a long time to get approved. Hardware and software tends to be bought way in advance with enough room to support growth for the next three to five years, depending on the depreciation schedule a given company is using for capital purchases. This, of course, means most of the hardware will go unused most of those three to five years.
The various pieces of infrastructure need to be procured, brought on-site, physically installed, and configured. The complexity of this in many large environments typically means you are also purchasing professional services to assist you along the way – additional cost at a time when you really don’t feel like to spend anymore.
Maintaining the OS and backup software
Gone are the days when you can ignore the updates from your operating system and backup software vendor; many of them are critical security patches that must be installed immediately. Such updates often require downtime and need a well-documented backout plan for when they go wrong. The last thing you need when loading the latest patches from your operating system or backup vendor is to have things suddenly stop working before it’s time to take today’s backups.
This is why the backup system is no different than any other system – patches should be tested somewhere other than production before they are rolled out. This, of course, requires a duplicate system – so no one does this. So you roll out patches on the backup system in production during the day and hope that nothing goes wrong. If it does go wrong, hopefully you’ve got time to back it out before the backups are supposed to run.
Managing multiple vendors
Most backup systems are comprised of many vendors: the server hardware vendor, the OS vendor, backup software vendor, deduplication appliance vendor, tape library vendor, tape drive vendor, and the tape cartridge vendor. Other parts that have caused me heartache in the past include Fibre Channel HBAs, Fibre Channel switches, Ethernet switches – the list goes on and on.
I can remember one particularly egregious incompatibility between a server hardware vendor, an HBA vendor, and a switch vendor. Only when you used all three together would the problem rear its ugly head, making it near impossible to troubleshoot. That’s why it took us two months to figure out why backups were regularly failing.
Finger-pointing continues to be a problem in multi vendor environments. This is why converged and hyper-converged virtualization products continue to be very popular; they remove most of these problems. Hyper converged backup appliances have started to address this problem, but for now we are focusing on traditional backup solutions where most people are purchasing each piece of infrastructure from a different vendor.
Properly sizing and designing a backup system is hard enough. When things go wrong, and the entire system throughput is significantly lower than what you believe it should be, figuring out the cause could be quite problematic in a multi vendor world.
In the next post in this series, I will look at the role that tape has played (and continues to play) in the traditional backup world. What started as an incredible advancement in backup technologies ended up becoming backup’s worse enemy. Make sure to check it out!
In the meantime, learn how Druva can unlock the true value of cloud backup and read Druva’s Definitive Guide to Enterprise Data Backup and Recovery Architectures, comparing on-premise, hybrid, hosted, and cloud-native solutions.