Platform
- Data Security Cloud
  Data Security Cloud
  Fully managed data security across enterprise, cloud, SaaS, and end user.
- Data Protection
  Data Protection
  Modernize data protection to reduce costs and complexity
- Cyber Response & Recovery
  Cyber Response & Recovery
  Bounce back from cyber attacks with data that is always safe and ready.
- eDiscovery & Compliance
  eDiscovery & Compliance
  Secure, protect, and streamline data governance.
- Meet Dru - Your Copilot for Data Security
Solutions
- Use Cases
  Use Cases
  Learn how Druva helps you accelerate key business initiatives
- Key Technologies
  - Public Cloud
    Public Cloud
    Protect native AWS and Azure deployments with secure backups without the cost and complexity
    
    Amazon EC2
    
    Amazon RDS
    
    Azure
  - Hybrid Workloads
    Hybrid Workloads
    Transform data center backup and disaster recovery for virtual environments
    
    VMware
    
    Hyper-V
    
    Nutanix
    
    Oracle
    
    MS SQL
    
    SAP HANA
    
    NAS/files
  - Endpoint and SaaS Apps
    Endpoint and SaaS Apps
    Enterprise Cloud Backup and data management across edge, on-premises and cloud workloads
    
    End User Protection
    
    Microsoft 365
    
    Salesforce
    
    Google Workspace
    
    Microsoft Entra ID
    
    Microsoft Dynamics 365
- Free Trial
Customers
- Explore All Customer Stories
  We are trusted by the world's leading organizations to protect their data. Explore customer success stories to see how your peers are using Druva.
- Ransomware recovery ready
  Learn why Medallia chose Druva
  
  SaaS data protection across the enterprise
  See why Regeneron partnered with Druva
Resources
- Druva vs. Veeam TCO Calculator
  Find the hidden costs of legacy backup
  
  Forrester: Total Economic Impact of Druva 2024
  Customers see 224% ROI: Find out how
Partners
- Programs
  Programs
  Learn how you can profit with Druva and a cloud-first SaaS selling motion. Explore partner programs, access resources, and discover the benefits of partnering with Druva.
- Strategic Partners
  Strategic Partners
  Learn about Druva's strategic capabilities across platform, OEM, and other partnerships. Find out how Druva accelerates and protects customers' cloud journeys.
  - Dell Technologies
  - AWS
  - VMware
  - Nutanix
- Become a Partner
Company
- - Company
  - Leadership
  - Investors
  - Careers
  - Contact Us
  - Newsroom
  - Awards
  - Events
  - Diversity, Equity & Inclusion
  - Blog
- Get in touch with us
  Contact Us
  
  News, product innovations, and more
  Blog
Get Started
Support
Login
Language
- English
- Deutsch

Innovation Series

Druva’s DynamoDB tuning journey

August 05, 2020 Pallavi Thakur, Principal Engineer

Druva was architected from the start as a cloud-native backup solution, built on AWS. Our cloud file system natively leverages the services that AWS has to offer to achieve that best scale, performance, and security for protecting our customers’ data.

Druva’s versioned cloud file system stores metadata in the form of key-value pairs and Amazon DynamoDB is an excellent fit for this use case. DynamoDB is a scalable, non-relational database and provides single-digit millisecond performance at any scale. However, DynamoDB needs to be provisioned upfront — slightly higher than the anticipated consumption. Tuning and provisioning DynamoDB capacity in an optimized manner is crucial for:

Reducing the total cost of ownership of Druva’s cloud-based solutions
Minimizing backup and restore failures due to DynamoDB throttles

While AWS also has its own auto tuner, it’s not necessarily the best fit for Druva. This blog will provide insights into the various approaches taken by Druva to achieve the goals described above.

How does DynamoDB-provisioned capacity work?

Example of DynamoDB provisioning

A DynamoDB database is called a table, which stores multiple key-value pairs. Each key is uniquely identified by a combination of partition-key and range-key. The partition-key is mapped to a physical partition and range-key identifies the item uniquely within that physical partition.

Provisioned capacity is separate for reading and writing I/O operations and is equal to the number of operations of a given type that can be successfully serviced by the DynamoDB backend per second.

Consumed capacity is the actual number of I/O operations of the given type (read or write) that are requested per second.

In the case that consumed capacity exceeds the provisioned capacity, the excess requests may fail and are called as throttles or failed requests that should be retried at a later time. In such an event, the best option is to retry the request after a certain wait time, where the wait time is increased exponentially for each subsequent retry. This approach is called exponential retry logic and is supported by many SDKs.

The need for custom DynamoDB tuning

DynamoDB throttles can cause delays in application requests due to the time spent in retries and may even result in request failure if all retries are exhausted. Thus, sufficiently provisioning DynamoDB capacity to avoid throttles is essential.

Another reason for DynamoDB throttles even with sufficiently provisioned capacity is due to thin partitioning. To understand this better, let’s dive a bit deeper into the workings of DynamoDB internals:

DynamoDB guides the application to uniformly distribute the keys as follows:
- Uses a wide range of partition-keys
- Stores approximately equal number items with the same partition key value
The above ensures that the physical partitions are uniformly utilized and that the provisioned capacity can be used most effectively.
- For example, if the table has 10 physical partitions and the provisioned capacity is 10K IOPS, then 1K IOPS is the effective provisioned capacity for each physical partition
As a result, if the application has an unequal distribution of items to each physical partition, there will be throttles even with sufficient provisioning.

Another major challenge in using DynamoDB is the use of hotkeys. For example, if a single key is read 10K times per second, it will result in throttles since it resides in a single, physical partition which is provisioned by only 1K IOPS.

Last but not the least, the time required to change the provisioned capacity is the matter of a few minutes, which may or may not be tolerated based on the nature of the application.

Trend-based DynamoDB tuning

A few years back, DynamoDB throttles were a serious problem for Druva and the impact was potential backup failure or other task failures due to provisioned throughput error. Throttles and task failures increase at the start of work hours. This happens due to a sudden surge in the number of backup tasks starting, increasing the consumed IOPS.

Reaction-based provisioning increased strategy is not sufficient in such cases as it takes around 5 minutes for the increase in provisioning to take effect.

The solution designed to mitigate this problem is a trend based DynamoDB tuner. This tuner avoids the time lag in increasing provisioning with respect to consumption as follows:

Maintaining consumption trend information of each DynamoDB table for the past 4 weeks.
At 30 minute intervals, for the same day of the week and at the same time, the past consumption values are obtained.
Based on the past trend, a median of DynamoDB consumption is calculated — named as consumption-trend.
A concept called a trend-multiplier was also invented to keep an appropriate gap between provisioning and consumption, to avoid throttles.
Provisioned capacity is set 15-30 minutes in advance for the given time of the day as consumption-trend/trend-multiplier.

This tuning strategy allows minimizing the throttles at all times, except in the case when trend-based prediction and actual usage patterns differ. In such an event, reaction-based tuning is the only option.

COGS-efficient DynamoDB tuning

Over the years, DynamoDB has evolved and the number of throttles has significantly reduced. With that, Druva received an opportunity to save COGS (cost of goods sold) by changing our provisioning strategy.

AWS also has its own auto tuner, which is purely consumption-based. It does not react to throttles, which is the main reason we needed to provide our own tuner. Although infrequent, the thin partition and hotkeys have still caused throttles sometimes.

COGS optimization can be achieved through the following:

Reduce the multiplier based gap between consumed and provisioned IOPS by provisioning less when there are no throttles. If the default multiplier is 1.2, provisioning is 20% more of consumption. By reducing the multiplier to 1.1, 10% savings on COGS can be achieved.
Save COGS by eliminating the multiplier based logic completely for highly provisioned tables. For example, if a DynamoDB table consumes 100k, then provisioning 10K extra IOPS because of the 1.1 trend multiplier value results in significant costs. For such tables, the gap between consumed and provisioned IOPS can be reduced even further, by having a constant gap for consumption slabs.

The tuning criteria:

1. Increase provisioned IOPS (Input-output operations per second) in response to an increase in consumed IOPS based on the following formula:

New provisioned IOPS = min (new consumed IOPS * iops_multiplier, new consumed IOPS + iops_gap_for_slab)
Below is a table that reflects an example of consumed IOPS slabs and corresponding gaps:

Consumed IOPS	IOPS gap
0 to 20k	1k
X k to 2X k	X/10 k
200k and above	10k

2. Decrease provisioned IOPS every 15 minutes (or configurable time interval) if consumed IOPS have dropped so that the gap between consumed and provisioned is more than the recommended gap + margin for that IOPS slab.

3. Increase provisioned IOPS in response to throttles below:

Ignore all throttles below tolerable throttle percentage
If throttles are above tolerable throttle percentage

4. New provisioned IOPS = current provisioned IOPS * (1 + throttle percentage)

Conclusion

Custom DynamoDB tuning has certainly benefited Druva and has helped optimize performance as well as reduce TCO. The recommendation for DynamoDB users is to pick a tuning strategy that is most suitable for the use case at hand. In some cases, a hybrid approach may also be beneficial. It is also important to weigh the complexity of implementing a custom tuner versus the actual benefits and then choose an appropriate strategy.

Learn more about how Druva has built a metadata-optimized backup architecture in the cloud.

Druva’s DynamoDB tuning journey

How does DynamoDB-provisioned capacity work?

The need for custom DynamoDB tuning

Trend-based DynamoDB tuning

COGS-efficient DynamoDB tuning

Conclusion

Druva Blog: Cloud Technology & Data Protection Articles

Druva Data Security Cloud

The Druva Platform

Data Protection

Cyber Response & Recovery

eDiscovery & Compliance

Use Cases

Key Technologies

Customers

Resources

Partners

Company