If you’re not including RTO and RPO in conversations when talking about your data protection and resiliency systems, you are missing a core concept. We only back up so we can restore, and we only restore what matters to the organization. This podcast makes the point that no matter how fast your restore, and no matter how much data you lose (or don’t), your recovery will be a failure if it doesn’t match the expectations of the stakeholder. Learn about RTO and RPO, and how they need to be agreed upon beforehand in order to have a successful recovery – and to get more funding for your data protection system.
[00:00:00] W. Curtis Preston: This week on No Hardware Required, we’re talking about RTO and RPO. With me, as always, is our CTO, Stephen Manley. Thanks for joining.
Hi and welcome to Druva’s No Hardware required podcast. I’m your host, w Curtis Preston, AKA Mr. Backup. I have with me Stephen RTO Manly. How’s it going, Stephen.
[00:00:23] Stephen Manley: But this is now, is this the return to office RTO? Cause nobody likes that RTO.
[00:00:27] W. Curtis Preston: This is the good RTO. This is the, this is the, you know, the one that, The one in our world? Yeah. Nobody, nobody liked that RTO.
[00:00:35] Stephen Manley: All right. Yeah, yeah. Return to office, they’d be like, Oh, RTO is terrible.
[00:00:40] W. Curtis Preston: Yeah, RTO is terrible. Uh, yeah, that’s a whole other subject we could spend I’m sure. Plenty of time talking about. Uh, so it, it’s interesting for me, so I would, you know, I started my backup career in 93 and I swear it was at least four or five years into basically doing nothing but backups. I still hadn’t heard that term, RTO and RPO.
I, I, I’m, I’m amazed as, as crucial as a term as it is. What I remember was we focused so much on backup window back in the day. Right.
[00:01:17] Stephen Manley: was the first I was, I was gonna say, when I, when I first came in, What’s your backup window? Can I hit the backup window? It’s all about the back. I’d never heard RPO RTO either. It was all backup window all the time.
[00:01:28] W. Curtis Preston: And, and, and here’s something that I love to say as no one, no one cares if you can back up. No one cares if last night’s backup worked, except maybe you. Right? They only care if their restore works, you know? And in similar statement is no one cares about the millions of backups that you got, right?
They only care about the one restore you got wrong, right? And, and the key to this, I say, Is this is about setting expectations because your backup system will do what it does or not do what it does. The question is, have you properly set expectations of what that will be? And that’s what I think. That’s what RTO and RPO are about.
What do you think?
[00:02:16] Stephen Manley: a hundred percent. I, I, you know, there, there’s, there’s a model I try to tell every one of our customers, and it fits in with that. It’s. Another tla, it’s service level agreement, right? RPO and RTO. It’s all about, look, you know, people are always gonna ask you to be infinitely fast both ways. They’re gonna want the, the last bit of data, and they’re gonna want it restored instantly.
But you need to set your SLAs with them and then of course, assign a cost to it, or you’re gonna be a very unhappy person because, like you said, the only data I care about is my own. And I, and, and what I want it. I want it now.
[00:02:53] W. Curtis Preston: Yeah. You, you ask any customer, you know, internally, right? You should be doing this, you should be talking to your business units and you know, how long do you want the restore to take. Zero, Right? And how, how much data is it okay to lose. Zero? Right? They always want that, just like you said, and so since that’s the case, if you haven’t had this discussion before the recovery, you are going to fail no matter how good your backup system is.
[00:03:21] Stephen Manley: Yeah,
[00:03:22] W. Curtis Preston: Does that, does that sound about right?
[00:03:24] Stephen Manley: a hundred percent. I, I, I was on the phone yesterday with, uh, with a customer and, uh, and, and that was almost the exact, uh, discussion. We, we had just done a poc.
[00:03:36] W. Curtis Preston: Mm-hmm.
[00:03:37] Stephen Manley: application, Hyperion. Right. Which is, which is not an easy application. Lot of, lot of moving pieces. You know, Oracle bought them a couple years back and it worked.
Cross country recovery. This was, we’re like, woo, yay. And uh, and a person who’d not been involved in the project came in and said, Well, I took like an hour. And we’re like, Yeah. He’s like, I expected it to be instantaneous. If you have the, everyone sort of chuckles, like especially on, on, on the customers team, right?
It’s his team’s like, That’s really funny. He’s like, No, like can we introduce you to the speed of light? So, So if you don’t set those expectations, that big party that we did, something amazing becomes, like you said, sort of this. Yeah. You guys aren’t good enough.
[00:04:29] W. Curtis Preston: Yeah. And, and so you, you have to set that, So let’s just define the, the terms, right? So recovery time objective, that’s how long the restore is to take. Um, and then recovery point objective is, you know, how much data you’re allowed to lose as a measure of time, right? So like three hours worth of data. And, and, and I always like to mention, Everyone but you like meaning?
Meaning the backup person, Everybody but you. They start the RTO clock when the thing goes down and they, and they stop it when everything is back up. You’re thinking, Oh, I get four hours to do a restore. No, you don’t. You get four hours to get everything back up and running, and that includes whatever else it is you have to do to make that happen.
I, I’m sure you’ve seen that, uh, out there in the wild as well.
[00:05:22] Stephen Manley: And including, you know, to your point, including someone telling you that you need to be doing the restore. There have been a lot of times where, uh, not to name, but back in my NetApp days, there was a large networking company that was a customer of ours, and they had a four hour RTO and uh, and they missed it.
And, uh, and I’m talking with the backup admin. I’m like, So, so what happened? He is like, Wow. It wasn’t until hour five that they told us we needed to do a restore. So,
[00:06:00] W. Curtis Preston: I don’t know what to do with that. Yeah, yeah. In the middle of a large restore or a small restore. Right. If, if this, especially if this is like a ransomware recovery, right? Everybody has to be on the same page and, um, you know, you’ve gotta have that communication. But I would say first off, you need to know what your system is capable of,
[00:06:21] Stephen Manley: Yeah.
[00:06:22] W. Curtis Preston: right?
So that when you go to have this meeting, so you know that your system does, you know, a terabyte, an hour, whatever, whatever it is, right? Um, or you know, that you’re recovering in the cloud. So maybe you can do it faster, maybe you can do it 15 minutes. , but you need to know what that is. And then you have the meeting.
And that way, that way you can, that they say, Well, we want, we want five minutes. And, and you can very matter of frankly say, Well, the system that we currently have is incapable of meeting that requirement. And uh, and then real quick you go do some, some research and you come back and you go, We can get what you want.
It will cost 1 million. And, and, and, and then maybe they’ll pay for it. Maybe they won’t. Right? It’s, it’s a business discussion of the cost of the recovery versus the cost of the downtime. And it all starts with that, the RTO and RPO discussion. Um,
[00:07:18] Stephen Manley: and I, I would include in that RTO discussion cuz a mistake I’ve seen a lot of people make is. And it’s normal, right? You’re, you’re in with, with more, you know, with senior people and, and you want to give them the answer they want. And so it’s all right, could I recover in four hours?
And you know you well. Yeah. We’re talking about recovering AVM in four hours, uh, assuming the network’s lightly loaded and the backup system isn’t doing backups. Yes. Okay, so you said yes. Then when it happens, they wanna recover a hundred VMs in, you know, while running 43 backups on a congested network and you’re gonna fail.
So, so, so I think one thing that’s really important is if you’re an environment where you can’t scale on demand, you have got to specify scale. because it makes such a huge difference between I’m recovering a file or a sub directory, or a VM versus a VM farm, an entire filer. Those, those are very different discussions, and so that’s one, you know, it’d be very clear about the scale.
The second one is, and what am I recovering to, because you, I’ve also seen people go, Yeah, I, I could have pumped that data. There wasn’t a vn, there wasn’t a free ESX server. for me to recover to. Now, again, if you’re in the cloud, you scale on demand, you allocate on demand, you’re good. But if you’re talking on prem, you better make sure there’s something to recover to, or you’re just pumping bits to /dev/null, and nobody’s getting their system up and right.
[00:08:58] W. Curtis Preston: Well, luckily, you know, right now it’s really easy to order, uh, server.
[00:09:06] Stephen Manley: Absolutely. Did I say 24 hour RTO? What I meant was 24 day maybe
[00:09:14] W. Curtis Preston: Yeah, exactly. Exactly. And you know it, I don’t know. I mean, we do, we do kind of live in a cloud bubble, but you know, when we talk about Dr. The, to, to me, the, the cloud. And this is before, before I joined Druva, DR to me is the killer app for the cloud because what do you need?
You need infinite resources right now and you don’t want to pay for them until right now. There, there is no way to get that without the cloud. And it’s super easy to get it with the cloud, right? You, you, you can’t even if you contracted with pick your favorite DR vendor,
[00:09:59] Stephen Manley: Okay.
[00:10:00] W. Curtis Preston: um, and you’ve paid them lots of money, there are trucks involved and you know, and physical transportation and somebody’s gotta get from A to B.
There’s laws of physics involved and et cetera. In the cloud. And I will say, especially if you’re using a cloud-based recovery system like DVA in the cloud, you could just literally automate it all and just snap your fingers and poof, you’ve, you’ve orchestrated and created an entire data center in the cloud and the recovery just kicks off and you, you know, you’re recovering the cloud to the cloud, like, I don’t know.
I, I, when I think about satisfying a hard RTO and RPO, the cloud just seems to me just magic.
[00:10:46] Stephen Manley: I, I think the other one, and, and I know this is a little dorky technically, but, but if you look at large switch sales, and I’ll explain why I’m covering this, but the 400 gig, 800 gig type of switches, like 90% of them are sold to cloud vendors. You have so much more bandwidth cloud inside your cloud.
Then even in the Fortune five companies have within their data center. So, so to Curtis’s point, not only can you spin out the resources on demand, but if your backups are in the cloud and the thing you wanna recover to is in the cloud, you’ve got a huge pipe, much bigger than anything you’re gonna have in your data center.
So when I wanna do a fast RTO, I want to get my resources, I want the biggest pipe possible, and I wanna, you know, unleash the, you know, fill this, the, the, the sky full of monkeys carrying data to your servers, you know, sort of Wizard of Oz style so that you can get up and running.
[00:11:43] W. Curtis Preston: Uh, I’ll throw out another couple terms. Uh, RTA and RPA, right? So recovery time, actual recovery point, actual. So that’s sort of what you actually get when you go to do a restore. Um, I, I think a lot of people. They, they do either one of two things. They, they either have no RTO or P and, and they just sort of hope for the best.
I think that the hope for the best thing is the worst, right? The other is that they, that they’ve got the, the RTO and R, they’ve agreed to it, but then they don’t regularly test their recoveries to know that we can indeed the, you know, our RTA and RPA are the same as our RTO and RPO, or significant, it should be significantly under, Right.
[00:12:29] Stephen Manley: time and everything else.
[00:12:30] W. Curtis Preston: ex. Exactly. And somebody has to tell you, , somebody has to tell you to do the restore. Uh, that’s a great story. I love it. Um, the, and, and, you know, we harp on this a lot. Again, this is gonna be a pro cloud thing because you can orchestrate a disaster recovery. You can do this. Easy peasy. But just to go back to the beginning here, the key number one is to ask that question.
This is the, you know, this is the elephant in the room. You know, this is the topic nobody wants to talk about. But here’s the thing. If, if you’re responsible for the recovery of your company, And you are not bringing this up. You are just, you’re going to fail at some point. The only question is, is when. You need to bring this up and, and find out whether or not these terms have been defined,
[00:13:21] Stephen Manley: Yeah, I, I think, and, and I know we’ve talked a lot of, of RTO, I do wanna spend a minute on RPO as well. Um, because
[00:13:31] W. Curtis Preston: go glad to.
[00:13:32] Stephen Manley: wanna be able to, to, to restore quickly. But, but I do think on the other side, um, you know, when I talk to customers, RPO again, that, that first instinct, Well, I don’t wanna lose a byte of data.
Everything I do is important. Okay, well, it’s gonna cost you this much money. Well, when I said everything I did was important, what I meant was I have this much money, what can I get? Um, but, but, but RPO I, I think, becomes more and more important, and as we evolve, in terms of setting those expectations, because especially as we look at, at cloud workloads, especially as you look at SaaS applications, your dependencies and the challenges, it gets harder, right?
So, so on-prem you can look and you can say, Well, I, I, I can’t really back up my VM farm fast enough. I could push for faster ESX servers, a bigger network, more disk, more iops, something to sort of gimme more juice. Um, but if I’m talking SaaS applications, Well, at some point I’m limited by API throttling by by that vendor, whether it’s Microsoft, Google, Salesforce, whoever it’s gonna be.
And so it’s really important that you sit down and understand how often can I back up successfully? Because if the business is asking me for something different and it’s just not possible, I can’t buy more, then I need to really drive that expectation.
[00:15:01] W. Curtis Preston: Yeah, it’s real simple to understand that your RPO and your backup frequency are tied, right? Um, you’re, you’re, if you’re backing up once a day, you’re not gonna be able to meet an RPO of one hour , right? Uh, you, you might be able to with some databases, which transaction logs, but not, if not, in a true disaster that wipes out everything or a ransomware attack or something like that, the backup frequency.
You know, is very closely tied to RPO. And, and you’re right, there are a lot of dependencies these days on, um, where you know how often you can back up certain things and you hit API throttling problems. And which again, I’m gonna go back to my previous statement, have the discussion, determine what your RPO is, and then find out if you can meet it.
And then you find out, Oh, well I can’t, I can’t do that that often. Or if I do that that often, I have to use this other tool, which is five times the cost of this tool. So then you go back and say, Look, I can’t do a five minute RPO like you, Like I could do a one hour RPO, or I could do a five minute RPO.
It’s, it’s 1 billion, right? Uh, and I can do, but I can do, you know, a a one hour RPO and that costs me a lot less money, right?
[00:16:18] Stephen Manley: Right,
[00:16:19] W. Curtis Preston: You become part of the business discussion. Let them make a business decision, right? It’s like, you know, I was in consulting for a lot of, a lot of, uh, years and the thing I always told them, what the consultants. The answer is never no. The answer is, I’d be happy to add that to the scope and the cost of the agreement. Right? Um, the same here is you. You don’t go. They go, Well, we want a one. We wanna, we want a one minute RPO. No, can’t do that. No. It. All generally, you know, most RTOs and RPOs are readable. Just things have to change.
So you come back to, Okay, I research, here’s how we get, you know, here’s how we meet an RPO of, of five minutes. Uh, it’s gonna cost us $16 million.
[00:17:05] Stephen Manley: Right,
[00:17:07] W. Curtis Preston: Well, they also reached at an RPO of one hour, and that’s gonna cost us a hundred thousand dollars. And then they go, Okay, yeah, yeah, let’s do that.
Because they understand the concept of, of decreasing marginal returns, right? You spend a certain amount of money and you get this, and then you spend a whole bunch more money and you don’t get much more, Uh, business people understand that, that concept. And you, you then get on the same page and possibly you’re both unhappy, but at least they’re not unhappy with you.
Right? It’s not your fault. You’re both unhappy. You what? You, you both want more, but you, you both agree that as a business, we’re not gonna spend more to, to get that right. And then you don’t end up, you know, uh, having a, an rpe. Right. Resume producing event. You don’t, you don’t want that. Right. So
[00:17:57] Stephen Manley: I, I
[00:17:57] W. Curtis Preston: what we’re in the business, Stephen, is, is keeping, you know, keeping people from doing rpe.
[00:18:02] Stephen Manley: this is true and, and I will say the other thing that you can unlock by doing this, cuz I like your point of, of really being able to speak to the business people is, you know, some of the organizations I’ve seen, they try to get an RPO across everything. And if you can start to map it to, well, let’s talk about which applications you care about most.
Maybe we can do something more there because you’re willing to spend more. But for a lot of this other stuff, maybe 24 hours is fine because it’s Curtis’s data and,
[00:18:36] W. Curtis Preston: It doesn’t matter that
[00:18:37] Stephen Manley: we’re eventually getting it. We’re fine. And,
[00:18:40] W. Curtis Preston: So we lose, So we lose a podcast or two.
[00:18:43] Stephen Manley: Right, Exactly. But, but, but it gives you that chance also to then have that application discussion, which then opens up the RPO, the RTO, the retention period, uh, you know, residency, all these other sorts of things that you probably actually wanna be talking about because you’re talking to them about their apps as opposed to the cost of this, this vast, you know, sort of pool of data.
So anytime you can shift this to a business discussion the way Curtis did. You’re better off.
[00:19:14] W. Curtis Preston: Absolutely I, and you become a part of the business rather than, you know, just a person saying no all the time. or the person saying nothing, and then getting fired when your restore fails, even though it succeeded.
[00:19:31] Stephen Manley: Yes.
[00:19:32] W. Curtis Preston: Imagine that you had this giant restore and then you’re like, I’m so happy I restored in four hours.
And they’re like, uh, we expected one. Uh, you’re outta here. You know? Yeah. So, um, that’s, it is a good topic. Thanks for, Thanks for having the discussion.
[00:19:47] Stephen Manley: Ah, always, always here to help. And again, you know, like, like Curtis said, this is your chance to elevate your career to be an equal at the business table. Cuz if we’re looking forward 10, 20 years from now, that’s the spot you want to be in. You don’t want to be Dr. No and Dr. Slow.
[00:20:05] W. Curtis Preston: I like it. I like it. And, uh, I wanna thank the listeners again and, uh, remember to subscribe so that you don’t miss an episode. And remember, here at Druva, there’s no hardware required.