Platform
- Data Resiliency Cloud
  Data Resiliency Cloud
  Enterprise Cloud Backup and data management across edge, on-premises and cloud workloads
- Data Protection
  Data Protection
  Modernize data protection to reduce costs and complexity
- Cyber Resiliency
  Cyber Resiliency
  Be ready for cyber attacks with data that is always safe, always ready
  - Accelerated Ransomware Recovery
  - Security Posture & Observability
- Governance & Compliance
  Governance & Compliance
  Secure, protect, and streamline data governance for all your critical data, wherever it lives
  - eDiscovery and Legal Hold
  - Sensitive Data Management
- Take a Tour
Solutions
- Business Drivers
  Business Drivers
  Learn how Druva helps you accelerate key business initiatives
- SaaS Applications
  SaaS Applications
  Druva provides comprehensive data protection that supports multiple SaaS applications from a single platform. Discover the Druva difference today.
- Enterprise Workloads
  - Virtualization
    Virtualization
    Transform data center backup and disaster recovery for virtual environments
    
    VMware
    
    Nutanix
  - Databases
    Databases
    Reduce the cost and complexity of data protection for enterprise databases
    
    Oracle
    
    MS SQL
    
    SAP HANA
  - Files
    Files
    Discover a more cost-efficient way to protect on-premises and cloud NAS
    
    NAS/files
  - Public Cloud
    Public Cloud
    Protect native AWS and Azure deployments with secure backups without the cost and complexity
    
    AWS
    
    Microsoft Azure
- Enterprise Endpoints
  Enterprise Endpoints
  Unify SaaS apps and end-user device protection to reduce data risks. Improve cyber resilience and compliance by protecting enterprise workloads and assets.
- Free Trial
Customers
- Explore All Customer Stories
  We are trusted by the world's leading organizations to protect their data. Explore customer success stories to see how your peers are using Druva.
- Ransomware recovery ready
  Learn why Medallia chose Druva
  
  SaaS data protection across the enterprise
  See why Regeneron partnered with Druva
Resources
- 2023 Gartner® Magic Quadrant™
  See why Druva is recognized as a Visionary
  
  Data Resiliency for Dummies
  Get your guide to data resiliency
Partners
- Strategic Partners
  Strategic Partners
  Learn about Druva's strategic capabilities across platform, OEM, and other partnerships. Find out how Druva accelerates and protects customers' cloud journeys.
  - Dell Technologies
  - AWS
  - VMware
  - Nutanix
- Programs
  Programs
  Learn how you can profit with Druva and a cloud-first SaaS selling motion. Explore partner programs, access resources, and discover the benefits of partnering with Druva.
- Become a Partner
Company
- - Company
  - Leadership
  - Investors
  - Careers
  - Contact Us
  - Newsroom
  - Awards
  - Events
  - Blog
  - Diversity, Equity & Inclusion
- Get in touch with us
  Contact Us
  
  News, product innovations, and more
  Blog
Get Started
Support
Login
Language

News/Trends, Tech/Engineering, Product

High Availability and Amazon AWS

August 10, 2011 Jaspreet Singh, Founder and CEO

A lightning in Dublin knocked out Amazon and Microsoft data centres offline for few hours and it took sometime to get all the services restored.

Although it did affect Netflix, foursquare and few others, thankfully Druva cloud services were completely unaffected by this. Here is a small note on how we managed to keep our promised SLA.

I think its plain ignorance or mis-planing to assume 100% availability of underlying infrastructure. Just like any hardware, the AWS infrastructure is prone to failures, but the knowledge of these potential failure points can help improve availability.

Since its a backup service, we have have divided our cloud design into 3 parts based on the availability and durability guarantees :

Config (most available): Configuration data stored in Amazon RDS
Meta-Data : Druva Dedupe file-system spanning across Cassandra nodes
Data (most durable): Stored in S3

And some design changes we incorporated to avoid downtimes :

Multi-Zone replication: Both RDS and Cassandra nodes are replicated across 3 availability zones. We use Cassandra in full-consistency mode and heavily rely on its self-healing, in case of service failures.
Reduced Dependency on EBS: EBS is a software abstraction of an underlying SAN storage. And two independent EC2 instances may share same SAN for EBS. Given this we shifted our focus from EBS to local-storage for meta-data.
Extra space copies in S3: We so maintain some extra redundancy on top of S3 for most referenced blocks. This essentially is to avoid the random (but less frequent) S3 time-outs and improve durability of most concurrent data.

We surely paid more for improved availability, but there are simple design changes which can help save as well. For example the 3-way replication increased our compute(EC2) cost by over 200%, but because of extra spare we could increased the data stored per instance, which was earlier restricted to maintain a good cache-vs-on-disk ratio.

High Availability and Amazon AWS

Blog

Druva Data Resiliency Cloud

Cloud Backup & Recovery

Data Protection

Governance & Compliance

Cyber Resilience

Business drivers

Workloads

Partners

Customers

Resources

Company