News/Trends

Large Microsoft 365 customer incident proves you need backup

W. Curtis Preston, Chief Technology Evangelist

Over the course of my career, I’ve seen numerous examples that prove why backing up your data is critical to the health of your company, not only for on-premises data, but also data stored in SaaS applications. Just last week, an incident concerning KPMG, a large customer of Microsoft 365, served as yet another example as to why SaaS customers should use third-party backup if they care about the data stored in these services.

This is only the most recent in a series of incidents that go back to CodeSpaces.com that ceased to exist when their cloud account was hacked and deleted – including its backups. It also includes Musey Inc who accidentally deleted their own G Drive account with no backup, and Salesforce, who ended up corrupting their customers’ data and then telling them to go fix it themselves.

Being aware of the policies and methodologies of Microsoft 365, I’ve been predicting that this would happen to a Microsoft customer at some point – and now it has.

What happened to this Microsoft 365 customer?

KPMG is a huge customer of Microsoft 365 with at least 145,000 accounts. I can only assume that the person responsible for administering Microsoft 365 in such a large environment knows what they’re doing. However, as you will see in this story, such a person can still make mistakes, and this one was a doozy.

It appears the KPMG administrator wanted to delete the personal Teams chat history of a single user. The user was assigned to a Microsoft Retention Policy that retained his chats, so the admin couldn’t just delete the user’s chats. They had to move the user to a different retention policy with lower retention settings.

I believe this scenario alone proves my point. An administrator (rogue or otherwise) wanting to do significant damage can simply change the retention policy that would protect the company from what they are doing. This is true unless you activate the Preservation Lock feature that, if set, won’t let you change the retention of data already stored by a retention policy. However, there are many reasons, including storage costs, and the right-to-be-forgotten policies in GDPR and CCPA, that companies might not want to activate this feature.

Mr. Redmond told me once that I see rogue admins around every corner. I have indeed seen many rogue admins in my nearly three-decade career in IT, and it’s not like there haven’t been multiple incidences where Microsoft 365 has had their code allow unauthorized admin access. (Just look here and here, it’s also well documented that Microsoft is a huge target.)

As I was writing this blog post, a disgruntled former Cisco employee deleted 456 AWS VMs that supported 16,000 Cisco Teams accounts. Yes, Virginia, there is a Santa Claus. And yes, rogue admins exist. Cisco estimates recovering from this damage will cost them $1.6M.

In KPMG’s case, however, this wasn’t a rogue admin. It was an administrator simply trying to do what they were told to do, so they created a new retention policy and attempted to move the user to the new policy. What actually happened is that they accidentally moved 145,000 users to that policy and subsequently deleted their chat history instead.

Microsoft then told them there was no way to recover this information. When the feature you are relying on to protect you is the feature that does you in, well, that’s kind of hard to recover from.

It could be argued that this is “only personal chats within Teams” data. However, this is not the case with a company that has a regulatory requirement to store such information. Druva supports the protection and recovery of Teams data, with users’ private chats coming very soon, but like similar companies, we are limited to the APIs Microsoft makes available. We are therefore not able to restore those chats; they are only available for download.

I think all of that would be missing my point, which is that this is why I don’t trust something like Retention Policies, or the Recycle Bin, or the Site Collection Recycle Bin, because they’re all stored inside the thing being backed up. Make a mistake like KPMG did, and your “backups” disappear.

I have the same opinion regarding the references in Microsoft 365 documentation to some kind of backup available for 14 days for SharePoint Online that could restore your site collection if you weren’t able to fix things with the Site Collection Recycle Bin. There is no mention that this “backup” is any different than all the other things you can restore from in Microsoft 365, which aren’t backups at all, but a series of Recycle Bins and a Retention Policy system that cannot be used to restore your SharePoint, OneDrive, or Exchange Online account to the way it looked before something really bad happened. (It’s an eDiscovery system, not a backup system. These things have very different purposes.)

Shouldn’t the Cisco employee’s access have been revoked after he left? Yes. Shouldn’t there have been other measures in place to stop him? Yes. Shouldn’t the KPMG admin tested what he did before he made such a significant change to such a huge environment, and perhaps used Powershell to make such a big change? Absolutely. Occurrences like this are why we have backups that are stored in a completely separate system.

Microsoft retention policies

I have been arguing that customers should be backing up Microsoft 365 for a long time. Tony Redmond, a Microsoft 365 expert, seems to feel that third-party backup is unnecessary and that people like me often use FUD to make their case. His point is well-taken that many claims made about Microsoft 365 by backup vendors are either out of date or simply wrong. I try hard not to make that mistake, and I use Mr. Redmond’s website to fact-check myself.

I will concede that Mr. Redmond’s knowledge of Microsoft 365 definitely exceeds mine. I would hope that he would accept that my experience with the reasons we back up data exceeds his. Based on my nearly thirty years of helping people restore their data following worst case scenarios, I disagree that Retention Policies meet the requirements of a proper backup for the following reasons:

  • They are complicated, with dozens of pages of documentation
  • Actions by bad actors or accidents by real administrators can thwart their purpose
  • Preservation Lock may help with the kinds of issues outlined above, but it comes with its own set of drawbacks.
  • Versions being stored in the same system is a violation of the 3-2-1 rule
  • They are an eDiscovery tool and are not designed to restore to a point-in-time
  • They can also significantly increase storage costs from Microsoft

This is why the 3-2-1 rule exists:

The 3-2-1 rule is the E=MC2 of backups; it is the foundation upon which all backup design is based. (At least three versions, on two media, one of which is somewhere else.) The “2” and the “1” in the rule say to not store backups in the same place as what you are backing up. Microsoft Retention Policies and Google Archive do not conform to this rule and therefore are not backup.

Please consider looking at third-party alternatives to these limited tools. They may actually cost less than what you are already doing, and they also certainly cost less than having something like this happen to you.

Learn more about how Druva can protect your Microsoft 365 information.