Podcasts

What Worries Backup Administrators?

W. Curtis Preston, Chief Technology Evangelist

Druva's tagline is "Don't worry about your backups. Ever." Since Stephen Manley, our CTO, and W. Curtis Preston, Mr. Backup, have decades of experience with non-Druva backup environments, we asked them to talk about the average non-Druva backup administrator. Stephen talks about things from his experience on the vendor side, and Curtis talks about trying to make backup systems run in the wild. It's a fun episode that reminds you of all things you don't have to worry about as a Druva customer.

W. Curtis Preston: hi and welcome to Druva No Hardware Required podcast. I'm your host, w Curtis Preston, AKA mr. Backup, and have with me a guy that I assume has No worries at all.

Stephen Manley, our c t o. How's it going, Stephen? No worries for the rest of your days according to, according to Disney,

[00:00:33] Stephen Manley: Exactly, and, and really if I think about how I live my life, it's based on Disney songs. So, you know, let it go.

[00:00:41] W. Curtis Preston: Yeah, let it go. Yeah. I, uh, I have two children that are now grown and it definitely, we grew up on, you know, or they, they were raised on Disney, so I know most of those songs by heart myself, and, uh, so I know exactly what you're talking about. Um, and, and it's, you know, it's a good, it's a good phrase to start out because.

We've been talking a lot about Druva lately, about saying that, you know, a Druva customer doesn't have to worry about their backups ever. And I thought it might be interesting, you know, both you and I spend a lot of time at other companies where customers have a lot of worries about their backups, right?

And our customers really don't, right? There's just things that they don't have to worry about. So I thought it'd be good moment to just talk about what do we think a typical, you know, backup admin of an on-premises backup system. What are the kinds of things that they worry about?

And, uh, I'll let you start. What do you think's the first one.

[00:01:47] Stephen Manley: so I'm, I'm not gonna go with the biggest, scariest one, but I'm gonna go with the one that that just bugs you the most on a daily basis, and that's troubleshooting failed backups. Because you just, you wake up in the morning and you say, well, I know the first couple hours of my day are gonna be spent troubleshooting failed backups.

And whether it's because, uh, you know, uh, some sort of media filled up, or, uh, because there was a version mismatched somewhere, or just something funky happened, uh, I know I'm gonna spend the first couple of hours of my day triaging and hopefully restarting the failed backup, so I get a good one in the can.

Annoying is all get out. Now. Again, not the biggest, scariest thing, but just that little. Just every day. It's like a paper cut every day.

[00:02:31] W. Curtis Preston: Yeah, it's a, yeah, death of a thousand cuts. I, I do remember that as well. Right. Basically it's like you're, you know, you, you did your backups at night, you woke up in the morning and that was what you did over your morning. Coffee was look at backups and, uh, or in my case cuz it was, uh, Before we could do it over the internet.

So I, I had my morning coffee, then I went into work, but it was the first thing that I did was open up that screen that would give me some idea of what failed last night. And it, it is true that, and, and I do remember media failure being the pri, the primary cause. The, the other causes were, uh, just basically random things going on with the network.

And, uh, I agree that that is definitely probably the, the number one thing sort of every morning that every backup admin typically has to worry about. Um, I was going to, when I, I thought, you know, we'll save the big bad wolf for the end. There's two, um, we, we'll talk about both of them.

The one is performance management. Right. Performance design, performance management, performance capacity. And when I say capacity, I mean the ability of the system to handle today's backup load and tomorrow's backup load. Um, that's what I remember spending a huge amount of time on, because partly, especially when we talk about tape, but even when we talk about the dedupe world, The, the thing is that tape, the tape performance is incredibly, incredibly variable, right?

Um, well, the tape, the tape's performance was very, it was, it was, uh, it was constant. It was constantly bad, right? Uh, but mainly because of that, that tape speed mismatch that we, that we've talked about before. Um, and so there was that problem. But then also with dedupe, with many of the, um, especially the target dedupe systems, you throw a big.

Bunch of data at it and what it does with, it varies depending on the day, depending on the workload and therefore the ability of that system to handle that workload. And then God forbid, we decide we want to bring something back, right? Because then what we're talking about is a restore. And the restore performance was, it wasn't even, your mileage may vary.

It was like, you know, you might not get any mileage. Right.

[00:04:58] Stephen Manley: I mean, to to, to your point, and, and certainly for those of you who are, you know, tape performance, just look up shoe shining, and if you couldn't stream the tape device, it, it gave you terrible performance. Um, but, but, but even in the disk world, and, and this was very common, people would spend a lot of time, Architecting their environment and they'd balance, right?

So, so let's say you bought multiple, you know, sort of dedupe storage systems. You, you try to balance in terms of capacity and in terms of the performance. And you'd say, well, well this NAS share is, is fairly static, so that's not gonna generate much data. So I'll combine this with this high churn rate, Oracle database, or you know, whatever it was gonna be you, you'd really try to balance that out.

And it would work for the first week, and then the world would change. Like all of a sudden that NAS share that was idle becomes the most important project and it's a huge churn rate. And now you've got three big churn rate things going after one machine. But you can't just move it because you've got this dedupe base here.

So if you just, you know, retarget the workload to a different system, now you're doing another level zero backup that doesn't work. And, and so, so as a backup team, you were always. Reacting, uh, there was always something that the production team was doing that, that turned this beautiful master constructed environment into something that was both overutilized and underutilized, uh, de depending on what those workloads look like on a daily basis, it was just as frustrating as you could get.

[00:06:22] W. Curtis Preston: Yeah, and that that issue of the dedupe issue, especially if you had dedupe islands and if you had dedupe, generally speaking, you had dedupe islands. And still to this day, many of our competitors, if they've got on-prem dedupe systems, they might sell one that's a petabyte in capacity and you're saying, well, I don't need a petabyte.

I need, you know, I need this. But then you end up with, you realize, oh, you did need more than you thought you need. And so you end up buying multiple, uh, storage systems. And they are dedupe islands. They don't talk to each other. They don't, they don't dedupe across. So you were talking about, you know, you we're backing up to this one, and then for performance reasons, you decide to move.

This workload over here, but then that creates, as you said, it's, it's a level zero backup at that point because you are, uh, it doesn't know anything about that workload. Right. So that, that was that problem and that also created this other problem of capacity. You were constantly trying to figure out, it would be a lot easier if you can move those workloads all into one global dedupe pool and you can.

Depending on how large, you know, how large your environment is and how big you're willing to buy a dedupe box. But it kind of means you have to buy the top of the line, even though you only need the, you know, the bottom of the line, right? Uh, you know, in the beginning, right? So this is that problem of scale up versus scale out, right?

That there's that problem.

[00:07:47] Stephen Manley: And even then you could potentially get into network segmentation issues of, you know, I'd like this all to be in one domain, but because of the way my network's set up, I can't, or because it's remote offices, I can't or, and, and so, so, so you, you still find yourself sort of with lots of little boxes and again, those dedupe islands and, and, and again, you, you, you wake up one day and you say, Half of the systems are 12% utilized, and the other half of the systems are 97% utilized, and I'm, I'm unhappy both ways.

This is great.

[00:08:18] W. Curtis Preston: Right. Right. And yeah. And you, and so then again, now you're like, well, maybe I should move this one over here. Right. And, and you have to constantly think of what will that do to the performance of my backups? Right. You're constantly moving things around to try to, to solve either a performance issue or a capacity issue.

And every time you move something, you then significantly impact the capacity issue. Right. Until. The, it expires eventually off of the system that you move it off of. Right. Then you're, but, but each time you move it to another system, you're forcing another full. Um, and you're just constantly doing that, uh, in order to keep the system happy.

Um, happy is a, it's a, it's,

[00:09:04] Stephen Manley: your backup windows at least with a prayer. Yeah,

[00:09:07] W. Curtis Preston: Well, well, night, you know what you brought up? You brought up the W word, uh, the backup window, you know? Wow. I hadn't even, I hadn't thought about that in so long because that was the thing, right? You had this agreed upon time. Mine, uh, I remember back in the day was 10:00 PM to 6:00 AM.

That was the time I was given basically an eight hour window. And if I did backups anytime outside of that window, I got in trouble. Right? Because doing backups outside of that window impacts production and so on, right? So that was the window I was given. And there was, again, this is why the, we talked about the performance aspect, right?

You must manage the performance in order to fit the backups inside that backup window. Um, when I think about the way we do things where we just sort of. Pick a little bit at a time, right? The fact that everything is block level incremental forever, and then we, and then even that we, we try to pare down with further, uh, source-side deduplication so that the impact of any one backup is much, much less than, um, you know, than what a even a typical incremental would be.

And definitely much less than what a full would be. And so you're free to basically do backups whenever you want. Uh, but yeah, the, the backup window, I'm sure you ran into that a lot on the, on the vendor side

[00:10:27] Stephen Manley: Oh, well, I mean, heck, we used to, uh, you know, I mean, so, so you had the nightly one and then the weekly one you would live for that period between like Friday at 5:00 PM

[00:10:37] W. Curtis Preston: Yeah.

[00:10:39] Stephen Manley: at like 6:00 AM and you're like, all right, can I get all my fulls done in this one weekend? Because if I get all the fulls done in a weekend, then again, I get my, my incrementals differentials, whichever method you chose during the week.

And, uh, and you'd, and you'd watch these and you'd see these customers, you'd go, oh my gosh. Uh, You know, in, in, in about two more weeks, they're not gonna be able to fit this in, and now what are we going to tell them? So it, it was, it was, it was just a losing proposition. Right? Just the, the, the amount of data, this was, this was the problem that we worried about for 20 years, is just the data's growing so fast, I can't pull it out fast enough.

And to your point, there's no way I can put it back in fast enough on a restore.

[00:11:17] W. Curtis Preston: Yeah. And dedupe really didn't help because dedupe is this, you know, this magical thing that that takes, you know, it, it reduces capacity in a very variable way, depending on your workload, depending on what, how you've constructed the backups. And so dedupe definitely made that. More complicated, right?

Trying to figure out your capacity over time. Um, before we get to the, you know, the big, bad one, uh, can you think of anything else that, um, you know, your customers were

[00:11:53] Stephen Manley: so, so, so there's, so there's one that I think leads a little bit into the big bad one, which is, Uh, patching, upgrade, management, all of that sort of thing. So, so the number of times I dealt with customers who said, well, to, to, to either fix this bug or support this, this new version of this workload, you're telling me I have to upgrade this agent, but then when I upgrade the agent, it only works with this version of the servers.

Now I have to upgrade the server. Ah, but that version of the server only works with this version of my dedupe uh, appliance, and I have to upgrade the dedupe appliance. Oh, but that means that now that's cascaded back to my other servers have to upgrade, which means now my agents have to upgrade. And just to get support for the newest version of Oracle, I've suddenly had to upgrade my entire backup environment.

Uh, let me make it very clear that I hate you very much, and, uh, and, and, uh, and if you make me do this again, I'm, I'm going to find where you live. and then the next quarter we'd make him do it again. So,

[00:12:52] W. Curtis Preston: Yeah. Exactly. Exactly. Well, and yeah, you, you know, you mentioned that it leads into the big bad one because there is, you know, there, there are some patches that come out. That you really, that are tied to vulnerabilities, right? That you really have to be on. I can think of one really big hack that happened last year.

Um, I won't say the company name cuz we, I think we already trashed him enough on the podcast, but, uh, where basically the entire hack, which for the record took out an entire business line, um, could have been stopped. By simply applying the appropriate patch at the appropriate time, that for the record had been out, uh, for well over a month when it happened.

And when you think about that, you, you never know when a vulnerability is released and a patch is released at the same time, you never know how much time you have between, you know, that announcement. And you potentially being hacked. It could be 30 minutes, it could be, it could be weeks.

But you have to have this constant worry that your, you know, that you've applied the appropriate patches to keep your system safe, right?

[00:14:11] Stephen Manley: And, and, and that again, that patching applies to os, it applies to software, it applies to, you know, all the way up and down the stack because if you're vulnerable anywhere, they will eventually find it. And, and that's, that's the scary part.

[00:14:27] W. Curtis Preston: Yeah. And why, why do we talk about that? Of course, our, our customers just simply don't have to worry about that, right? Um, there, there, there isn't a backup server for them to upgrade. There isn't a backup server application for them to upgrade. The only thing that does need to be upgraded, which would be if there's client involved, we up, we can upgrade that for them for automatically.

Um, so, you know, we, we hinted at, at the, at the beginning, And, and that is, I, I think if the average person today said, what am I worried about it, it's ransomware. Right? Um, it's when you look at the number of hacks that are out there that happened, and there's always that phrase of, and the backups were also encrypted.

That's, if I'm administering an on-prem backup system, that's what I would be worried about, right.

[00:15:21] Stephen Manley: right.

[00:15:22] W. Curtis Preston: And you look at our competitors and our competitors, that's clearly what they're worried about because they're issuing a lot of, um, guidance of what you should do. Right? I can think of one vendor that has taken the 3 21 rule and added some other numbers on the end where it's like, well, you need to make sure one of the backups are immutable.

[00:15:46] Stephen Manley: Right.

[00:15:47] W. Curtis Preston: And I just. I reply to that of like, well, shouldn't all the backups be immutable? Right? I mean, what, what's wrong with that? Well, it's really hard to do when the backups are sitting on a Windows box, right? It's really hard to do when the backups are sitting on essentially an N F S mounted, um,

[00:16:05] Stephen Manley: Piece of storage. Yeah.

[00:16:06] W. Curtis Preston: of storage, right?

Um, it's even difficult to do with a Linux hardened repository, right? Uh, which is something else that we've seen our competitors use. Where, um, they, they turn on the immutable flag in the, the, uh, file system. The thing is that anyone with root can turn that flag back off. So, so it's immutable ish, right?

It's better than nothing. Right. It's definitely on that continuum. But that would, if I was a backup admin today, that's the thing that would have me. Stay awake at night of if, when, when we get attacked, right? Because you do have to, I think, assume breach at this point when we get attacked. Have I done what I need to do to the backup system to make sure that it's safe?

Does that sound about right?

[00:16:57] Stephen Manley: A hundred percent. And, and, and I think, you know, sometimes I meet backup teams, um, that, that are worried about the right thing, which is that sometimes I, I, I, I hear them coming and saying, well, but, but first I need to help us identify the attack. And I want, and I look at 'em and say, look, that's all good.

But your first step is when it comes time for the recovery to happen, you need to be able to, to, with confidence, put your hand up and say, we're gonna get the data back. It's gonna be clean, uh, and I'm gonna be able to do it efficiently, and, and then everything you do after that, that's awesome. But do your job first before you start worrying about somebody else's job.

And I think, I think too often we get a little wrapped up and, and I want to go look at this cool, neat new technology when step one is, you know, to, to use a very old phrase, mind your knitting. But, uh, you know, cuz I'm, I'm in mind you're knitting. Yeah. My mom used to say that.

[00:17:55] W. Curtis Preston: Oh, really?

[00:17:56] Stephen Manley: Yeah. It's, uh, I assume it means don't like start fiddling with other people's knitting until your scarf is, I don't

[00:18:04] W. Curtis Preston: Yeah. It's, it's, is it like stay in your lane? I think it's like stay in your lane. Yeah. Yeah.

[00:18:08] Stephen Manley: Exactly. Uh, and, and so let, let's nail that first. And, and to your point, you know, there's a way to do it where there's a ton of work on your side and there's a way to do it, you know, with, with us where yeah, we kind of take care of it for you.

[00:18:22] W. Curtis Preston: Right? Yeah. You, you just, it's like, well, if all backups are off-prem and all backups are encrypted and, and stored in another vendor's, uh, you know, it's a completely different authentication and authorization system. Uh, active director could get hacked, et cetera. Right. You know, you, you've got all those, um, all those protections built in for you.

You can do it with an on-prem system, but it's just much harder and. Puts a, puts a lot of the, the work on you. I think that's enough of scaring people today about what, reminding them of the things that they worry about. And, uh, so thanks, thanks again for, uh, reminding me of what it's, what it was like from the vendor side.

I, you know, I experienced it from the, from the customer side, so it's always nice to see that perspective.

[00:19:05] Stephen Manley: All right. Just always keep your backup windows open, so no, wait,

[00:19:09] W. Curtis Preston: That

[00:19:10] Stephen Manley: enough. Now. I got like, because so bad guys can crawl in. Eh, it's bad. Let's not do that.

[00:19:15] W. Curtis Preston: Yeah. Yeah. Let's, uh, yeah, I, I, I have no response to that. Uh, I, I appreciate the listeners, we're nothing without you, and remember to subscribe and, uh, so that you don't miss any episodes. And remember, here at Druva, there's no hardware required.