The Storage Hunger
Sale of disk-bases storage system has already crossed 2500 Petabytes in 2008 and up by 58.1% YOY (One petabyte = 1 Million GBs). These figures do not include the direct attached storage which comes pre-loaded with PCs or servers. This is understandable as 1TB (1000GB) storage NAS/SAN devices are now commodity. The top three vendors in this space are HP, IBM and EMC with market share of approximately 29%, 20% and 14% respectively. The overall consumption doubles when this storage is backed up 🙂
On an average a datacenter consumes 100 Watts/sq-feet of energy and the best solid state storage consumes about 5 watts for 1MB IOPs. This puts the total cost for maintaining (cooling + power) for 1 TB disk array about USD $2,500/annually. (16c for KWh, and 20 GB average daily usage). This makes the annual energy consumption of newly bought storage = USD 5 Billion !!! And backing this 5 Billion dollar inventory surely adds couple of more billions.
The data de-duplication technology saves single copy of duplicate data. There are two important aspects of any data de-duplication solution/product –
- Scope of duplicate discovery – File-level / Sub-File level / Block level
- Point of duplicate discovery – Source / Target
Most of the storage vendors which use data de-duplication provide block-level duplicate removal at target (i.e. when the data reached the storage). But, its not very difficult to image that source level removal of sub-file or block level duplicates would be much better for two reasons –
- Sending lesser/de-duplicated data saves time and bandwidth (apart from storage)
- Duplicate discovery would be much better as you have access to the structured data
Considering Microsoft’s report on de-duplicate assessment , –
- 20-30% data duplicates are easily visible even in unstructured data source like ERP databases
- 40-80% data duplicates can be seen in file-servers and mail servers.
- 60-90% data duplicates can be seen between different PCs. (Just my observation and opinion)
On an average a conservative 30% data duplicate removal can save $1.6B on storage energy and $2B on bandwidth costs and backups.
De-duplication and Druvaa
We see Druvaa inSync as a product/platform to provide de-duplicated (at source) backup for PCs, PDAs and servers. The current version is available for just PCs and we can easily see up to 90% savings for time and cost (bandwidth and storage) for enterprises. I just don’t see a reason why all storage and backup vendors wouldn’t do it. EMC and Netapp have already announced de-duplication as additionally licensable technology on their arrays (target based). No major vendor except for EMC has announced agent/source based de-dup though. Surely, Druvaa has a good lead and cashing on it 🙂