News/Trends, Tech/Engineering

Salesforce Outage Proves You Need to Backup Your SaaS Data

W. Curtis Preston, Chief Technology Evangelist

Salesforce.com’s longest outage ever occurred last week shutting out thousands of users for up to 15 hours or more. This was a huge inconvenience to many customers, bringing their sales and marketing operations to a halt for more than an entire business day, costing untold amounts of lost sales. It is also a further validation of the importance of backing up SaaS services like Salesforce, Office 365, and G suite.

What happened?

A database update script for Pardot (a marketing automation product owned by Salesforce) corrupted user access permission data, accidentally giving every Salesforce user access to every record within the organization. Without going into detail, suffice it to say this breach of standard security protocol could range from a mild annoyance to wreaking havoc. It also likely constitutes a violation of privacy regulations such as the GDPR.

Salesforce took Pardot offline to address the issue, and then took the Salesforce Marketing Cloud offline as well, which is used by other tools such as Journey Builder, Email Studio, and Audience Studio. Once the user access data corruption problem had been identified, customers were told they needed to reset the permissions of their non-admin users. Some users were back up as soon as services had been restored, while others appeared to scramble for days to properly reset permissions.

SaaS vendors don’t understand backup

If you are trusting your SaaS vendor to provide backup and recovery services, you are trusting a company that probably does not have a core competency in this area. In my opinion, the reaction by Salesforce to this outage proves my point. Why do I say that? They did not even mention backup as one of the methods that customers could use to restore the permissions of affected users!

Take a look at the official page describing the “workarounds” to fix the problem. One workaround is to copy the correct permissions from a sandbox copy of your Salesforce database, after ensuring that the sandbox was not affected by the script. The second workaround – if you can call it that – is to manually reset the permissions of all of your Salesforce users. That is a huge ask depending on the complexity of your organization. They never even mentioned that if you had a valid backup of your Salesforce database prior to the running of the script, you can simply restore your user object and you’d be good to go.

While I’m thinking about this, let me rant just a little more. A $13 billion company corrupted their customers’ data and then told their customers to fix it themselves. If you learn nothing else from this incident, understand that Salesforce does not have a backup of your information that they can easily access to fix parts of your Salesforce database. (They do have a restore service that they strongly advise against using and costs $10,000, but it has to restore your entire environment and takes several days, and has no SLA.)

Companies are comprised of humans

Until the day when all of us are replaced by robots using artificial intelligence, companies will be comprised of actual human beings – human beings that make mistakes. One might think that a company with an annual revenue approaching $13 billion could not be taken down by a single script – which is obviously not true. I’m sure that Salesforce has a huge change control process that should have identified this rogue script before it was ever used in production, and somehow that system failed. Remember that the next time someone tells you that insert company name here would never do something that would corrupt customer data – because that’s exactly what happened here.

When I encounter users that tell me that they don’t need to backup their VM’s running in the cloud, I tell them the ‘boogie man’ story of codespaces.com. Here’s a quick summary of that story for those who haven’t heard it. Codespaces.com was (ironically) a safe space to store your code and it ran entirely on AWS. They backed up their data using EBS snapshots and stored them in their AWS account. When a hacker gained access to their AWS control panel and Codespaces decided not to pay a ransom demanded by the hacker, the hacker essentially deleted their company – all their AWS infrastructure and data was gone. Since all of the backups were stored in the same AWS account, the backups were deleted as well – codespaces.com ceased to exist.

Until the next outage like this one, this Salesforce outage will be my example of why customers should backup their SaaS data. If you’re still not convinced, read on.

Long outages cost more money than short ones

Applications like Salesforce – and email and document services like Office 365 and G Suite – quickly become critical to the day-to-day operations of a company. When those applications go offline due to human error or some kind of disaster, companies can lose lots of money. A $100 million company makes roughly $400,000 in sales every business day. That means losing access to your communications, marketing, or sales tools for an entire business day can cost your company $400,000 or more. Suffice it to say that a backup system would cost less.

To come up with your own number, divide your company’s annual revenue by 250, as that is roughly the number of working days in a year, minus holidays and weekends. Then give Druva a call to see how much we would cost instead.

I mention this because one thing I hear from the anti-backup crowd is that a third-party backup tool would only make a large restore faster and easier and that that simply isn’t enough to justify the cost of the system. Faster and easier means your company returns to full operation more quickly and makes more money. I really don’t know how to state it any simpler than that.

Druva customers could recover more quickly

Druva customers that were affected by this outage were able to restore full operation much faster than anyone else. All customers had to do was open the Druva inSync app, select a point in time from before the incident and a point in time after the incident, view the affected users, and click restore. Done. There would’ve been no question as to whether or not they were restoring a corrupted version, the way there was with the sandbox “workaround.” There would be no manual configuration of hundreds or thousands of users’ permissions. There would have been a single login, a simple before and after query, and the push of a single button.

Please backup your SaaS data

This outage reminds me of the old adage “just because you’re paranoid doesn’t mean no one is out to get you.” Please take a look at the relatively small cost of backing up your really important SaaS data.