3 Ways to Dedupe your Duplicated Duplicates

The biggest culprit in the growing costs of data storage and backup is duplicated files. In fact, in some companies, duplicated files account for up to 30% of data that is (re)created. Downloading multiple copies of a document and emailing files to yourself are just a few of the ways this duplication occurs.

It’s hard to predict storage costs if you can’t determine just how much data is going to be created — and this is a top concern for IT. In this video, Druva’s Chief Product Officer, Chandar Venkataraman, dives into the three attributes of inSync’s global deduplication.

Location: Where is the deduplication performed?

Client-side vs. Server-side:When you have deduplication taken care of on the client-side, you will experience storage savings because less data is being shipped over to the servers.

Logic: At what level is deduplication accomplished?

You could do it at a file level, fixed or variable block level, or app-aware deduplication.

App-aware deduplication is the most effective dedupe methodology, because you could identify file duplicates in attachments, emails, or even down to the folder from which they originate.

Scale: Dedupe one device or dedupe coverage for all devices?

It’s important to have deduplication at the global level because that’s when you get the full network effect. Rather than individual silos, a single user could seed other users.

Video Transcript

Hello, my name is Chandar, from Druva. Today I’m going to be talking about Deduplication which is essentially looking at redundant data and eliminating them. Now different vendors make different claims about their own dedupe approach. I’m going to be talking about all the various approaches there are to dedupe.

The first attribute of deduplication is location. Where exactly is dedupe performed? You could do it on the server side for example on a data domain dedupe storage system or on the client’s side, but when done at the client’s side in addition to getting the same storage savings you also get bandwidth savings because you have to ship less data over the network, and that’s a great thing.

The second most important attribute of dedupe is logic. Let’s start with granularity. At what level is dedupe accomplished? Vendors could do it at a file level, you could look at two different files, identify if they’re the same, and de-duplicate them, or get sub-file level, typically at the block level.

You could adopt a fixed block approach, or a variable block approach. The variable block is usually more effective in finding duplicate blocks, regardless of where they’re stored. How many times you inserted a small paragraph in a big word document, or a small slide in a big power point? The variable block is going to be performing much better, but what’s more interesting is what’s called app aware dedupe. Imagine looking at inside a file just like the application that generated the file looks at it.

What if you could use technology like mappy, to look inside all your messages so you could demark every message very clearly and de-duplicate really effectively. If you did that you could identify duplicate attachments across messages and even dedup the attachments from where they were actually backed up inside the folder, and that’s very, very effective.

The last and most important attribute of dedupe, is scale. Now most vendors have a scale of one which is the dedup per device or per user, but imagine thousands, and thousands of devices where a single message from your CEO for example went to all of your different devices, or because of your sharing patterns, the same big document is shared across so many devices.

A poor use of Dedupe definitely has limitations. What’s most important is performing Dedupe at a global level and that’s when you get the true network effect. Imagine a global Dedupe, where every single user seeds every other user. You start to get exponential savings in both storage and bandwidth and that’s the true power of Dedupe.

And ultimate in scale is of course cloud, where the cloud is a design for infinite scale, and you could have a single Dedupe index across all of the regions globally and the cloud is great for global footprint and a global reach. Now Druva intelligently combines all the different attributes of Dedupe to come up with something very very unique: client side Dedupe, app aware Dedupe, global Dedupe, and we put it on the cloud. If you want to learn more, please visit druva.com. Thank you.

